2024-03-04 12:13:07.326 [4190] INFO launcher.DefaultLauncher.run:68 [tid=679400707b7928e9] - Task start. (id:de03518e6220403e1a8402d382d64cc7,name:TRAIN) 2024-03-04 12:13:07.328 [4190] INFO launcher.DefaultLauncher.run:72 [tid=679400707b7928e9] - Agent:default 2024-03-04 12:13:07.338 [4190] INFO repository.NodeStatusRepository.executeUpdate:151 [tid=679400707b7928e9] - Report status successful.(state:RUNNING) 2024-03-04 12:13:07.339 [4190] INFO node.GenericNode.start:108 [tid=679400707b7928e9] - Node start. (id:de03518e6220403e1a8402d382d64cc7,name:TRAIN) 2024-03-04 12:13:07.347 [4190] INFO memory.MemoryStore.logInfo:60 [tid=679400707b7928e9] - Block broadcast_1067 stored as values in memory (estimated size 497.3 KiB, free 2.5 GiB) 2024-03-04 12:13:07.354 [4190] INFO memory.MemoryStore.logInfo:60 [tid=679400707b7928e9] - Block broadcast_1067_piece0 stored as bytes in memory (estimated size 49.9 KiB, free 2.5 GiB) 2024-03-04 12:13:07.354 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Created broadcast 1067 from textFile at ReadWrite.scala:587 2024-03-04 12:13:07.359 [4190] INFO mapred.FileInputFormat.listStatus:266 [tid=679400707b7928e9] - Total input files to process : 1 2024-03-04 12:13:07.364 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Starting job: first at ReadWrite.scala:587 2024-03-04 12:13:07.377 [4190] INFO scheduler.DAGScheduler.logInfo:60 [tid=679400707b7928e9] - Job 723 finished: first at ReadWrite.scala:587, took 0.013670 s 2024-03-04 12:13:07.378 [4190] INFO util.EventSerializeUtil.deserialize:106 [tid=679400707b7928e9] - Deserialization event finished,took 0.039 s 2024-03-04 12:13:07.395 [4190] INFO datasources.InMemoryFileIndex.logInfo:60 [tid=679400707b7928e9] - It took 1 ms to list leaf files for 1 paths. 2024-03-04 12:13:07.431 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Starting job: parquet at DatasetEvent.java:238 2024-03-04 12:13:07.449 [4190] INFO scheduler.DAGScheduler.logInfo:60 [tid=679400707b7928e9] - Job 724 finished: parquet at DatasetEvent.java:238, took 0.018304 s 2024-03-04 12:13:07.451 [4190] INFO util.EventSerializeUtil.deserialize:106 [tid=679400707b7928e9] - Deserialization event finished,took 0.072 s 2024-03-04 12:13:07.451 [4190] INFO util.Instrumentation.logInfo:60 [tid=679400707b7928e9] - [2c1b4188] Stage class: FPGrowth 2024-03-04 12:13:07.451 [4190] INFO util.Instrumentation.logInfo:60 [tid=679400707b7928e9] - [2c1b4188] Stage uid: fpgrowth_9cba24115c1b 2024-03-04 12:13:07.453 [4190] INFO datasources.FileSourceStrategy.logInfo:60 [tid=679400707b7928e9] - Pushed Filters: 2024-03-04 12:13:07.453 [4190] INFO datasources.FileSourceStrategy.logInfo:60 [tid=679400707b7928e9] - Post-Scan Filters: 2024-03-04 12:13:07.455 [4190] INFO memory.MemoryStore.logInfo:60 [tid=679400707b7928e9] - Block broadcast_1070 stored as values in memory (estimated size 524.9 KiB, free 2.5 GiB) 2024-03-04 12:13:07.466 [4190] INFO memory.MemoryStore.logInfo:60 [tid=679400707b7928e9] - Block broadcast_1070_piece0 stored as bytes in memory (estimated size 54.3 KiB, free 2.5 GiB) 2024-03-04 12:13:07.467 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Created broadcast 1070 from rdd at Instrumentation.scala:62 2024-03-04 12:13:07.467 [4190] INFO execution.FileSourceScanExec.logInfo:60 [tid=679400707b7928e9] - Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes. 2024-03-04 12:13:07.471 [4190] INFO util.Instrumentation.logInfo:60 [tid=679400707b7928e9] - [2c1b4188] training: numPartitions=1 storageLevel=StorageLevel(1 replicas) 2024-03-04 12:13:07.471 [4190] INFO util.Instrumentation.logInfo:60 [tid=679400707b7928e9] - [2c1b4188] {"itemsCol":"Collect_list_投标人id","minConfidence":0.1,"minSupport":0.3} 2024-03-04 12:13:07.474 [4190] INFO datasources.FileSourceStrategy.logInfo:60 [tid=679400707b7928e9] - Pushed Filters: IsNotNull(`Collect_list_投标人id`) 2024-03-04 12:13:07.475 [4190] INFO datasources.FileSourceStrategy.logInfo:60 [tid=679400707b7928e9] - Post-Scan Filters: isnotnull(Collect_list_投标人id#10337) 2024-03-04 12:13:07.476 [4190] INFO memory.MemoryStore.logInfo:60 [tid=679400707b7928e9] - Block broadcast_1071 stored as values in memory (estimated size 523.1 KiB, free 2.5 GiB) 2024-03-04 12:13:07.486 [4190] INFO memory.MemoryStore.logInfo:60 [tid=679400707b7928e9] - Block broadcast_1071_piece0 stored as bytes in memory (estimated size 54.1 KiB, free 2.5 GiB) 2024-03-04 12:13:07.487 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Created broadcast 1071 from rdd at FPGrowth.scala:169 2024-03-04 12:13:07.487 [4190] INFO execution.FileSourceScanExec.logInfo:60 [tid=679400707b7928e9] - Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes. 2024-03-04 12:13:07.494 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Starting job: count at FPGrowth.scala:178 2024-03-04 12:13:07.556 [4190] INFO scheduler.DAGScheduler.logInfo:60 [tid=679400707b7928e9] - Job 725 finished: count at FPGrowth.scala:178, took 0.061179 s 2024-03-04 12:13:07.556 [4190] INFO util.Instrumentation.logInfo:60 [tid=679400707b7928e9] - [2c1b4188] {"numExamples":9470} 2024-03-04 12:13:07.559 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Starting job: count at FPGrowth.scala:216 2024-03-04 12:13:07.565 [4190] INFO scheduler.DAGScheduler.logInfo:60 [tid=679400707b7928e9] - Job 726 finished: count at FPGrowth.scala:216, took 0.005693 s 2024-03-04 12:13:07.571 [4190] INFO spark.SparkContext.logInfo:60 [tid=679400707b7928e9] - Starting job: collect at FPGrowth.scala:255 2024-03-04 12:13:07.607 [4190] INFO scheduler.DAGScheduler.logInfo:60 [tid=679400707b7928e9] - Job 727 failed: collect at FPGrowth.scala:255, took 0.035679 s 2024-03-04 12:13:07.608 [4190] ERROR util.Instrumentation.logError:76 [tid=679400707b7928e9] - org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 880.0 failed 1 times, most recent failure: Lost task 0.0 in stage 880.0 (TID 785) (DESKTOP-R7SKOIK executor driver): org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(36361, 7906, 36361, 7906, 4660, 4660). at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:250) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:405) at org.apache.spark.rdd.RDD.collect(RDD.scala:1018) at org.apache.spark.mllib.fpm.FPGrowth.genFreqItems(FPGrowth.scala:255) at org.apache.spark.mllib.fpm.FPGrowth.run(FPGrowth.scala:220) at org.apache.spark.ml.fpm.FPGrowth.$anonfun$genericFit$1(FPGrowth.scala:180) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.fpm.FPGrowth.genericFit(FPGrowth.scala:162) at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:159) at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:129) at org.apache.spark.ml.Pipeline.$anonfun$fit$5(Pipeline.scala:151) at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130) at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123) at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline.$anonfun$fit$4(Pipeline.scala:151) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.ml.Pipeline.$anonfun$fit$2(Pipeline.scala:147) at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130) at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123) at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline.$anonfun$fit$1(Pipeline.scala:133) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:133) at smartbix.datamining.engine.execute.event.FPGrowthEvent.fit(FPGrowthEvent.java:70) at smartbix.datamining.engine.execute.event.FPGrowthEvent.fit(FPGrowthEvent.java:29) at smartbix.datamining.engine.execute.node.train.TrainNode.execute(TrainNode.java:49) at smartbix.datamining.engine.execute.node.GenericNode.start(GenericNode.java:118) at smartbix.datamining.engine.agent.execute.executor.DefaultNodeExecutor.execute(DefaultNodeExecutor.java:50) at smartbix.datamining.engine.agent.execute.launcher.DefaultLauncher.run(DefaultLauncher.java:79) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(36361, 7906, 36361, 7906, 4660, 4660). at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:250) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ... 3 more 2024-03-04 12:13:07.608 [4190] ERROR util.Instrumentation.logError:76 [tid=679400707b7928e9] - org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 880.0 failed 1 times, most recent failure: Lost task 0.0 in stage 880.0 (TID 785) (DESKTOP-R7SKOIK executor driver): org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(36361, 7906, 36361, 7906, 4660, 4660). at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:250) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:405) at org.apache.spark.rdd.RDD.collect(RDD.scala:1018) at org.apache.spark.mllib.fpm.FPGrowth.genFreqItems(FPGrowth.scala:255) at org.apache.spark.mllib.fpm.FPGrowth.run(FPGrowth.scala:220) at org.apache.spark.ml.fpm.FPGrowth.$anonfun$genericFit$1(FPGrowth.scala:180) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.fpm.FPGrowth.genericFit(FPGrowth.scala:162) at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:159) at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:129) at org.apache.spark.ml.Pipeline.$anonfun$fit$5(Pipeline.scala:151) at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130) at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123) at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline.$anonfun$fit$4(Pipeline.scala:151) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.ml.Pipeline.$anonfun$fit$2(Pipeline.scala:147) at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130) at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123) at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline.$anonfun$fit$1(Pipeline.scala:133) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:133) at smartbix.datamining.engine.execute.event.FPGrowthEvent.fit(FPGrowthEvent.java:70) at smartbix.datamining.engine.execute.event.FPGrowthEvent.fit(FPGrowthEvent.java:29) at smartbix.datamining.engine.execute.node.train.TrainNode.execute(TrainNode.java:49) at smartbix.datamining.engine.execute.node.GenericNode.start(GenericNode.java:118) at smartbix.datamining.engine.agent.execute.executor.DefaultNodeExecutor.execute(DefaultNodeExecutor.java:50) at smartbix.datamining.engine.agent.execute.launcher.DefaultLauncher.run(DefaultLauncher.java:79) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(36361, 7906, 36361, 7906, 4660, 4660). at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:250) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ... 3 more 2024-03-04 12:13:07.608 [4190] ERROR node.GenericNode.handleExecuteError:149 [tid=679400707b7928e9] - Node execution failed.(id:de03518e6220403e1a8402d382d64cc7,name:TRAIN) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 880.0 failed 1 times, most recent failure: Lost task 0.0 in stage 880.0 (TID 785) (DESKTOP-R7SKOIK executor driver): org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(36361, 7906, 36361, 7906, 4660, 4660). at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:250) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) ~[spark-core_2.12-3.4.1.jar:3.4.1] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.17.jar:?] at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.17.jar:?] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.17.jar:?] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206) ~[spark-core_2.12-3.4.1.jar:3.4.1] at scala.Option.foreach(Option.scala:407) ~[scala-library-2.12.17.jar:?] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.rdd.RDD.withScope(RDD.scala:405) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.rdd.RDD.collect(RDD.scala:1018) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.mllib.fpm.FPGrowth.genFreqItems(FPGrowth.scala:255) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.mllib.fpm.FPGrowth.run(FPGrowth.scala:220) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.fpm.FPGrowth.$anonfun$genericFit$1(FPGrowth.scala:180) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.17.jar:?] at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.fpm.FPGrowth.genericFit(FPGrowth.scala:162) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:159) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.fpm.FPGrowth.fit(FPGrowth.scala:129) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.Pipeline.$anonfun$fit$5(Pipeline.scala:151) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.Pipeline.$anonfun$fit$4(Pipeline.scala:151) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library-2.12.17.jar:?] at scala.collection.Iterator.foreach$(Iterator.scala:943) ~[scala-library-2.12.17.jar:?] at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) ~[scala-library-2.12.17.jar:?] at org.apache.spark.ml.Pipeline.$anonfun$fit$2(Pipeline.scala:147) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.Pipeline.$anonfun$fit$1(Pipeline.scala:133) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at scala.util.Try$.apply(Try.scala:213) ~[scala-library-2.12.17.jar:?] at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:133) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at smartbix.datamining.engine.execute.event.FPGrowthEvent.fit(FPGrowthEvent.java:70) ~[EngineCommonNode-1.0.jar:?] at smartbix.datamining.engine.execute.event.FPGrowthEvent.fit(FPGrowthEvent.java:29) ~[EngineCommonNode-1.0.jar:?] at smartbix.datamining.engine.execute.node.train.TrainNode.execute(TrainNode.java:49) ~[EngineCommonNode-1.0.jar:?] at smartbix.datamining.engine.execute.node.GenericNode.start(GenericNode.java:118) ~[EngineCore-1.0.jar:?] at smartbix.datamining.engine.agent.execute.executor.DefaultNodeExecutor.execute(DefaultNodeExecutor.java:50) ~[EngineAgent-1.0.jar:?] at smartbix.datamining.engine.agent.execute.launcher.DefaultLauncher.run(DefaultLauncher.java:79) ~[EngineAgent-1.0.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_202-ea] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_202-ea] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202-ea] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202-ea] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202-ea] Caused by: org.apache.spark.SparkException: Items in a transaction must be unique but got WrappedArray(36361, 7906, 36361, 7906, 4660, 4660). at org.apache.spark.mllib.fpm.FPGrowth.$anonfun$genFreqItems$1(FPGrowth.scala:250) ~[spark-mllib_2.12-3.4.1.jar:3.4.1] at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[scala-library-2.12.17.jar:?] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[scala-library-2.12.17.jar:?] at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.17.jar:?] at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:197) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.scheduler.Task.run(Task.scala:139) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) ~[spark-core_2.12-3.4.1.jar:3.4.1] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1.jar:3.4.1] ... 3 more 2024-03-04 12:13:07.608 [4190] INFO flow.DistributionFlowContext.notifyNodeFail:526 [tid=679400707b7928e9] - Number of records outputed:0 2024-03-04 12:13:07.649 [4190] INFO repository.NodeStatusRepository.executeUpdate:151 [tid=679400707b7928e9] - Report status successful.(state:FAIL)