Welcome to Smart Agriculture 中文

Smart Agriculture

   

DEMA-3D TSP: An Enhanced Reinforcement Learning with DEMA Attention in Sequence Optimization for Safflower Picking Robot

LI Menghao1, WANG Xiaorong1,2(), LIU Zihe1, DUAN Mengyu1, JIN Zhengyang1   

  1. 1. School of Mechanical Engineering, Xinjiang University, Urumqi 830017, China
    2. Agriculture and Animal Husbandry Robot and Intelligent Equipment Engineering Research Center of Xinjiang Uygur Autonomous Region, Urumqi 830017, China
  • Received:2025-05-19 Online:2025-11-28
  • Foundation items:新疆维吾尔自治区青年科学基金项目(2023D01C190); 新一代人工智能国家科技重大专项(2022ZD0115801)
  • About author:

    李萌浩,硕士研究生,研究方向为农业路径规划。E-mail:

    LI Menghao, E-mail:

  • corresponding author:
    WANG Xiaorong, E-mail:

Abstract:

[Objective] The critical challenges are addressed in automated safflower harvesting, particularly the inefficiencies in path planning, suboptimal route quality, and limited decision-making capability under dynamic and complex environments. To solve these issues, the problem is formulated as a three-dimensional traveling salesman problem (3D TSP) and an enhanced reinforcement learning framework named actor-critic reinforcement learning pointer network (AC-RL-PtrNet) is proposed, specifically designed for deployment on intelligent safflower picking robots in agricultural settings. [Methods] First, to address the inherent limitations of conventional attention mechanisms in dynamic environments with complex spatial structures, an enhanced attention module was proposed based on the dynamic exponential moving average (DEMA) framework. By combining multi-head attention, spatial distance encoding, and adaptive exponential smoothing, the improved design allowed the model to better capture long-range dependencies and spatial context among safflowers. Meanwhile, to minimize computational cost while preserving inference quality, a structured pruning approach was adopted, which selectively removed redundant connections in the long short-term memory (LSTM) gates and fully connected layers. In parallel, the Critic network was redesigned to improve learning stability and accuracy. This was achieved through the inclusion of batch normalization, residual feature aggregation, and a multi-layer value estimation head, all of which contributed to a tighter Actor-Critic synergy during policy training. [Results and Discussions] To quantitatively assess the impact of each component, ablation experiments were conducted across various configurations. The results confirmed that each module contributed distinct benefits, while their combination yielded the highest improvements in both planning precision and inference efficiency. This coordinated Actor–Critic design effectively enhanced both trajectory quality and decision stability, which were critical in sequential robotic picking tasks. Experimental results also demonstrated that, compared with traditional swarm intelligence algorithms PSO, AC, and NSGA, the proposed AC-RL-PtrNet model achieved a planning time improvement ranging from –2.63% to 61.87% on the 25-target dataset and from 22.93% to 59.1% on the 31-target dataset. Meanwhile, the optimized paths were significantly shortened across different planning instances, indicating robust generalization capability under varied problem scales. Furthermore, field experiments provided concrete validation of the model's practical applicability. When deployed on a mobile picking robot in real safflower fields, the AC-RL-PtrNet achieved a 9.56% reduction in path length and 5.43% time saved for a 25-target picking task, and a 20.17% path reduction and 29.70% time saving for a 31-target scenario involving a different safflower variety. Overall, these results all indicated that the proposed method exhibited significant advantages in enhancing path planning efficiency and optimizing path quality. [Conclusions] Consequently, this study offers a practical solution for achieving efficient and robust automatic picking by safflower picking robots and provides new insights into solving 3D combinatorial optimization problems.

Key words: dynamic exponential moving average mechanism, structural pruning, reinforcement learning, 3D traveling salesman problem, safflower picking

CLC Number: