DEMA-3D TSP: An Enhanced Reinforcement Learning with DEMA Attention in Sequence Optimization for Safflower Picking Robot

doi:10.12133/j.smartag.SA202506004

Abstract

Abstract:

[Objective] There are several critical challenges in automated safflower harvesting, particularly the inefficiencies in path planning, suboptimal route quality, and limited decision-making capability under dynamic and complex environments. To solve these issues, the problem was formulated as a three-dimensional traveling salesman problem and an enhanced reinforcement learning model named actor-critic reinforcement learning pointer network (AC-RL-PtrNet) was proposed, specifically designed for deployment on intelligent safflower picking robots in agricultural settings. [Methods] First, to address the inherent limitations of conventional attention mechanisms in dynamic environments with complex spatial structures, an enhanced attention module was proposed based on the dynamic exponential moving average framework. By combining multi-head attention, spatial distance encoding, and adaptive exponential smoothing, the improved design allowed the model to better capture long-range dependencies and spatial context among safflowers. Meanwhile, to minimize computational cost while preserving inference quality, a structured pruning approach was adopted, which selectively removed redundant connections in the long short-term memory gates and fully connected layers. In parallel, the critic network was redesigned to improve learning stability and accuracy. This was achieved through the inclusion of batch normalization, residual feature aggregation, and a multi-layer value estimation head, all of which contributed to a tighter actor-critic synergy during policy training. [Results and Discussions] To quantitatively assess the impact of each component, ablation experiments were conducted across various configurations. The results confirmed that each module contributed distinct benefits, while their combination yielded the highest improvements in both planning precision and inference efficiency. This coordinated actor-critic design effectively enhanced both trajectory quality and decision stability, which were critical in sequential robotic picking tasks. Experimental results also demonstrated that, compared with traditional swarm intelligence algorithms particle swarm optimization (PSO), ant colony optimization (ACO), and non-dominated sorting genetic algorithm, the proposed AC-RL-PtrNet model achieved a planning time improvement ranging from -2.63% to 61.87% on the 25-target dataset and from 22.93% to 59.1% on the 31-target dataset. Meanwhile, the optimized paths were significantly shortened across different planning instances, indicating robust generalization capability under varied problem scales. Furthermore, field experiments provided concrete validation of the model's practical applicability. When deployed on a mobile picking robot in real safflower fields, the AC-RL-PtrNet achieved a 9.56% reduction in path length and 5.43% time saved for a 25-target picking task, and a 20.17% path reduction and 29.70% time saving for a 31-target scenario involving a different safflower variety. Overall, these results all indicated that the proposed method exhibited significant advantages in enhancing path planning efficiency and optimizing path quality. [Conclusions] This study offers a practical solution for achieving efficient and robust automatic picking by safflower picking robots and provides new insights into solving 3D combinatorial optimization problems.

Key words: dynamic exponential moving average mechanism, structural pruning, reinforcement learning, 3D traveling salesman problem, safflower picking, robot

CLC Number:

S238

LI Menghao, WANG Xiaorong, LIU Zihe, DUAN Mengyu, JIN Zhengyang. DEMA-3D TSP: An Enhanced Reinforcement Learning with DEMA Attention in Sequence Optimization for Safflower Picking Robot[J]. Smart Agriculture, 2026, 8(2): 200-219.

Figures/Tables 18

Fig. 1

Table 1

Table 3

Fig. 2

Table 2

Fig. 3

Table 4

Fig. 4

Fig. 5

Fig. 6

Table 5

Fig. 7

Table 6

Table 7

Table 8

Fig. 8

Table 9

Fig. 9

References 37

[1]	LU J X, ZHANG C X, HU Y, et al. Application of multiple chemical and biological approaches for quality assessment of Carthamus tinctorius L. (safflower) by determining both the primary and secondary metabolites[J]. Phytomedicine, 2019, 58: 152826.
[2]	WANG L, CHEN Z, HAN B, et al. Comprehensive analysis of volatile compounds in cold-pressed safflower seed oil from Xinjiang, China[J]. Food Science & Nutrition, 2020, 8(2): 903-914.
[3]	MA B J, XIA H, GE Y, et al. A method for identifying picking points in safflower point clouds based on an improved PointNet++ network[J]. Agronomy, 2025, 15(5): 1125.
[4]	WANG X R, ZHOU J P, XU Y, et al. Location of safflower filaments picking points in complex environment based on improved YOLOv5 algorithm[J]. Computers and Electronics in Agriculture, 2024, 227: 109463.
[5]	ZHANG Z G, WANG Y Z, XU P, et al. WED-YOLO: A detection model for safflower under complex unstructured environment[J]. Agriculture, 2025, 15(2): 205.
[6]	LI Y J, FENG Q C, LIU C, et al. MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting[J]. European Journal of Agronomy, 2023, 146: 126812.
[7]	JIANG Y W, CHEN J, WANG Z W, et al. Research progress and trend analysis of picking technology for Korla fragrant pear[J]. Horticulturae, 2025, 11(1): 90.
[8]	ZHAO C J, FAN B B, LI J, et al. Agricultural robots: Technology progress, challenges and trends[J]. Smart Agriculture, 2023, 5(4): 1-15.
[9]	DONG Z, ZHANG X H, YANG W J, et al. Ant colony optimization-based method for energy-efficient cutting trajectory planning in axial robotic roadheader[J]. Applied Soft Computing, 2024, 163: 111965.
[10]	CHEN D W, IMDAHL C, LAI D, et al. The Dynamic Traveling Salesman Problem with Time-Dependent and Stochastic travel times: A deep reinforcement learning approach[J]. Transportation Research Part C: Emerging Technologies, 2025, 172: 105022.
[11]	GUO Z W, FU H, WU J H, et al. Dynamic task planning for multi-arm apple-harvesting robots using LSTM-PPO reinforcement learning algorithm[J]. Agriculture, 2025, 15(6): 588.
[12]	LIU C J, ZHONG Y L, WU R L, et al. Deep reinforcement learning based 3D-trajectory design and task offloading in UAV-enabled MEC system[J]. IEEE Transactions on Vehicular Technology, 2025, 74(2): 3185-3195.
[13]	JATI G K, KUWANTO G, HASHMI T, et al. Discrete Komodo algorithm for traveling salesman problem[J]. Applied Soft Computing, 2023, 139: 110219.
[14]	SOITINAHO R, VÄYRYNEN V, OKSANEN T. Heuristic cooperative coverage path planning for multiple autonomous agricultural field machines performing sequentially dependent tasks of different working widths and turn characteristics[J]. Biosystems Engineering, 2024, 242: 16-28.
[15]	UTAMIMA A, REINERS T. Navigating route planning for multiple vehicles in multifield agriculture with a fast hybrid algorithm[J]. Computers and Electronics in Agriculture, 2023, 212: 108021.
[16]	GAO R L, ZHOU Q J, CAO S X, et al. Apple-picking robot picking path planning algorithm based on improved PSO[J]. Electronics, 2023, 12(8): 1832.
[17]	FANG S P, RU Y, HU C M, et al. Planning of takeoff/landing site location, dispatch route, and spraying route for a pesticide application helicopter[J]. European Journal of Agronomy, 2023, 146: 126814.
[18]	LI X M, GENG L B, LIU K Z, et al. Model-based offline reinforcement learning for AUV path-following under unknown ocean currents with limited data[J]. Drones, 2025, 9(3): 201.
[19]	SHEN J, CHEN M C, ZHANG Z C, et al. Model-based offline policy optimization with distribution correcting regularization[C]// Machine Learning and Knowledge Discovery in Databases. Research Track. Cham, Germany: Springer, 2021: 174-189.
[20]	SHARMA G, SINGH A, JAIN S. DeepEvap: Deep reinforcement learning based ensemble approach for estimating reference evapotranspiration[J]. Applied Soft Computing, 2022, 125: 109113.
[21]	ZHANG Q, FANG X W, GAO X D, et al. Optimising maize threshing process with temporal proximity soft actor-critic deep reinforcement learning algorithm[J]. Biosystems Engineering, 2024, 248: 229-239.
[22]	YANG J C, NI J F, LI Y, et al. The intelligent path planning system of agricultural robot via reinforcement learning[J]. Sensors, 2022, 22(12): 4316.
[23]	LIN G C, ZHU L X, LI J H, et al. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning[J]. Computers and Electronics in Agriculture, 2021, 188: 106350.
[24]	SANTIYUDA G, WARDOYO R, PULUNGAN R. Solving biobjective traveling thief problems with multiobjective reinforcement learning[J]. Applied Soft Computing, 2024, 161: 111751.
[25]	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[EB/OL]. arXiv: 1611.09940, 2016.
[26]	GU S S, YANG Y. A deep learning algorithm for the max-cut problem based on pointer network structure with supervised learning and reinforcement learning strategies[J]. Mathematics, 2020, 8(2): 298.
[27]	LIN G C, XIONG J T, ZHAO R M, et al. Efficient detection and picking sequence planning of tea buds in a high-density canopy[J]. Computers and Electronics in Agriculture, 2023, 213: 108213.
[28]	WANG X R, XU Y, ZHOU J P, et al. Safflower picking recognition in complex environments based on an improved YOLOv7[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39(6): 169-176.
[29]	BELHADI A, DJENOURI Y, BELBACHIR A N, et al. Shapley visual transformers for image-to-text generation[J]. Applied Soft Computing, 2024, 166: 112205.
[30]	SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Proceedings of the 13th International Conference on Neural Information Processing Systems. New York, USA: ACM, 1999: 1057-1063.
[31]	BRAUWERS G, FRASINCAR F. A general survey on attention mechanisms in deep learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(4): 3279-3298.
[32]	LIU Y, ZHANG C, HANG B, et al. An audio attention computational model based on information entropy of two channels and exponential moving average[J]. Human-Centric Computing and Information Sciences, 2019, 9(1): 7.
[33]	HE Y, XIAO L G. Structured pruning for deep convolutional neural networks: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(5): 2900-2919.
[34]	WANG Z, LI C C, WANG X Y. Convolutional neural network pruning with structural redundancy reduction[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2021: 14908-14917.
[35]	JIANG P C, XUE Y, NERI F. Convolutional neural network pruning based on multi-objective feature map selection for image classification[J]. Applied Soft Computing, 2023, 139: 110229.
[36]	CHEN J L, ZHAO P, CAO X L, et al. Lightweight YOLOv8s-based strawberry plug seedling grading detection and localization via channel pruning[J]. Smart Agriculture, 2024, 6(6): 132-143.
[37]	ZHU K H, HU F Y, DING Y B, et al. A comprehensive review of network pruning based on pruning granularity and pruning time perspectives[J]. Neurocomputing, 2025, 626: 129382.

Label	Coordinate points	Label	Coordinate points
1	(0.370 85, 0.225 16, 0.845 60)	14	(0.107 30, 0.718 10, 0.351 90)
2	(0.2607 1, 0.3415 2, 0.684 10)	15	(0.444 20, 0.735 80, 0.791 80)
3	(0.150 57, 0.274 46, 0.719 60)	16	(0.668 30, 0.909 50, 0.713 80)
4	(0.083 69, 0.296 23, 0.402 10)	17	(0.829 70, 0.785 10, 0.691 60)
5	(0.091 51, 0.540 71, 0.899 40)	18	(0.773 30, 0.714 30, 0.719 50)
6	(0.407 40, 0.517 10, 0.910 40)	19	(0.794 20, 0.560 00, 0.591 80)
7	(0.549 00, 0.278 40, 0.820 40)	20	(0.896 50, 0.643 30, 0.628 80)
8	(0.608 00, 0.065 50, 0.524 90)	21	(0.942 30, 0.594 00, 0.613 50)
9	(0.820 50, 0.177 80, 0.694 80)	22	(0.938 50, 0.692 60, 0.634 90)
10	(0.619 90, 0.457 80, 0.710 90)	23	(0.968 60, 0.824 70, 0.490 60)
11	(0.551 69, 0.434 36, 0.694 90)	24	(0.841 50, 0.897 50, 0.391 80)
12	(0.529 40, 0.564 48, 0.701 90)	25	(0.801 60, 0.962 40, 0.361 80)
13	(0.284 33, 0.649 31, 0.872 60)

Label	Coordinate points	Label	Coordinate points
1	(0.066 90, 0.944 80, 0.220 69)	23	(0.958 28, 0.501 20, 0.077 57)
2	(0.272 09, 0.007 86, 0.789 90)	24	(0.978 76, 0.869 70, 0.144 32)
3	(0.291 00, 0.675 90, 0.498 20)	25	(0.413 78, 0.755 43, 0.050 40)
4	(0.274 50, 0.580 01, 0.828 09)	26	(0.280 54, 0.011 32, 0.336 20)
5	(0.387 98, 0.546 18, 0.486 30)	27	(0.236 89, 0.745 04, 0.409 20)
6	(0.026 03, 0.840 90, 0.239 20)	28	(0.970 50, 0.141 88, 0.509 40)
7	(0.017 18, 0.414 45, 0.101 10)	29	(0.881 24, 0.569 85, 0.649 90)
8	(0.927 67, 0.631 05, 0.318 80)	30	(0.582 91, 0.902 41, 0.734 50)
9	(0.371 76, 0.722 19, 0.901 90)	31	(0.662 84, 0.021 60, 0.092 40)
10	(0.269 97, 0.768 43, 0.741 02)	32	(0.204 40, 0.369 01, 0.800 06)
11	(0.725 02, 0.085 77, 0.862 82)	33	(0.169 60, 0.898 20, 0.854 90)
12	(0.559 95, 0.614 29, 0.183 17)	34	(0.839 21, 0.173 10, 0.265 33)
13	(0.884 94, 0.752 67, 0.490 53)	35	(0.513 48, 0.998 80, 0.585 30)
14	(0.783 94, 0.016 53, 0.096 84)	36	(0.033 25, 0.672 79, 0.615 30)
15	(0.787 99, 0.435 66, 0.371 51)	37	(0.549 80, 0.890 58, 0.660 40)
16	(0.122 35, 0.477 45, 0.504 56)	38	(0.298 60, 0.111 27, 0.308 30)
17	(0.297 22, 0.291 65, 0.551 51)	39	(0.636 40, 0.499 50, 0.296 30)
18	(0.321 45, 0.471 07, 0.879 74)	40	(0.824 54, 0.998 80, 0.981 20)
19	(0.831 73, 0.542 33, 0.048 44)	41	(0.971 44, 0.608 50, 0.847 90)
20	(0.105 04, 0.866 72, 0.470 37)	42	(0.372 60, 0.688 90, 0.952 72)
21	(0.048 52, 0.678 90, 0.475 20)	43	(0.611 60, 0.412 78, 0.091 30)
22	(0.344 70, 0.201 90, 0.167 13)

Label	Coordinate points	Label	Coordinate points
1	(0.031 29, 0.153 36, 0.352 20)	17	(0.670 69, 0.953 60, 0.498 40)
2	(0.195 88, 0.113 25, 0.634 10)	18	(0.676 65, 0.848 85, 0.684 30)
3	(0.345 74, 0.266 43, 0.769 10)	19	(0.701 89, 0.865 28, 0.642 40)
4	(0.417 69, 0.398 49, 0.586 40)	20	(0.797 98, 0.896 00, 0.891 50)
5	(0.170 80, 0.548 25, 0.781 10)	21	(0.766 78, 0.708 47, 0.688 10)
6	(0.273 79, 0.629 54, 0.697 40)	22	(0.812 71, 0.784 00, 0.581 70)
7	(0.100 57, 0.739 19, 0.826 40)	23	(0.853 47, 0.841 81, 0.724 50)
8	(0.024 39, 0.693 11, 0.597 60)	24	(0.944 39, 0.759 25, 0.573 90)
9	(0.092 73, 0.861 65, 0.837 10)	25	(0.919 31, 0.683 73, 0.649 20)
10	(0.051 98, 0.953 60, 0.887 40)	26	(0.968 68, 0.692 05, 0.637 40)
11	(0.187 26, 0.958 50, 0.682 74)	27	(0.878 55, 0.613 11, 0.695 20)
12	(0.264 38, 0.914 77, 0.599 40)	28	(0.852 53, 0.497 48, 0.673 30)
13	(0.473 02, 0.967 90, 0.726 40)	29	(0.971 19, 0.255 98, 0.488 40)
14	(0.527 73, 0.798 08, 0.822 40)	30	(0.812 71, 0.277 10, 0.683 30)
15	(0.582 28, 0.953 60, 0.742 70)	31	(0.730 42, 0.069 73, 0.268 40)
16	(0.599 52, 0.966 62, 0.705 60)

Hidden dim	Training time/s	Best path	Model size/MB
64	1 672	11.856	4.251
128	1 998	11.414	4.462
256	2 920	11.186	5.282

Method	n=20		n=28		n=37		n=46
Method	Length/cm	Time/s	Length/cm	Time/s	Length/cm	Time/s	Length/cm	Time/s
baseline	6.769	90.83	8.524	183.26	10.299	368.36	11.806	511.05
GAT	7.399	98.46	12.064	197.85	16.638	414.79	23.665	602.34
EMSA	8.321	126.12	11.992	254.96	15.251	496.07	19.019	582.41
SeA	6.675	63.07	8.404	185.84	9.940	327.68	11.433	510.73
NoDist-DEMA	6.719	59.32	8.346	123.94	10.145	231.76	11.402	304.78
DEMA	6.674	51.01	8.302	113.74	10.058	224.29	11.398	334.62