欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2026, Vol. 8 ›› Issue (2): 200-219.doi: 10.12133/j.smartag.SA202506004

• 智能装备与系统 • 上一篇    

DEMA-3D TSP:一种应用在红花采摘机器人序列规划的融合DEMA注意力机制强化学习算法

李萌浩1, 王小荣1,2,3(), 刘子贺1, 段孟渝1, 金正阳1   

  1. 1. 新疆大学机械工程学院,新疆 乌鲁木齐 830017,中国
    2. 新疆维吾尔自治区农牧机器人及智能装备工程研究中心,新疆 乌鲁木齐 830017,中国
    3. 新疆大学工程训练中心,新疆 乌鲁木齐 830017,中国
  • 收稿日期:2025-05-19 出版日期:2026-03-30
  • 作者简介:

    李萌浩,硕士研究生,研究方向为农业路径规划。E-mail:

  • 通信作者:
    王小荣,博士,高级工程师,研究方向为精准农业。E-mail:

DEMA-3D TSP: An Enhanced Reinforcement Learning with DEMA Attention in Sequence Optimization for Safflower Picking Robot

LI Menghao1, WANG Xiaorong1,2,3(), LIU Zihe1, DUAN Mengyu1, JIN Zhengyang1   

  1. 1. School of Mechanical Engineering, Xinjiang University, Urumqi 830017, China
    2. Agriculture and Animal Husbandry Robot and Intelligent Equipment Engineering Research Center of Xinjiang Uygur Autonomous Region, Urumqi 830017, China
    3. Engineering Training Center of Xinjiang University, Urumqi 830017, China
  • Received:2025-05-19 Online:2026-03-30
  • Foundation items:新疆维吾尔自治区青年科学基金项目(2023D01C190); 新一代人工智能国家科技重大专项(2022ZD0115801)
  • About author:

    LI Menghao, E-mail:

  • Corresponding author:
    WANG Xiaorong, E-mail:

摘要:

【目的/意义】 红花自动化采摘中存在多个关键难题,尤其是路径规划效率低、路线质量欠佳,以及在动态复杂环境下决策能力受限等问题。为应对上述挑战,将采摘路径规划任务建模为三维旅行商问题,并提出了一种改进的强化学习框架演员-评论家强化学习指针网络(Actor-Critic Reinforcement Learning Pointer Network AC-RL-PtrNet)。该模型专为农业环境下的智能红花采摘机器人设计,可实现高效的路径优化与决策控制。 【方法】 首先,为克服传统注意力机制在复杂空间结构和动态环境中的固有局限性,提出了一种基于动态指数移动平均框架的增强型注意力模块。该模块融合了多头注意力、空间距离编码和自适应指数平滑机制,使模型能够更充分地捕捉红花目标间的长程依赖关系与空间上下文特征。同时,为在保持推理精度的条件下降低计算开销,采用了结构化剪枝方法,对长短期记忆网络的门控单元及全连接层中的冗余连接进行选择性移除。在此基础上,对评论家网络进行了重新设计,引入了批归一化、残差特征聚合,以及多层价值估计头,从而提升学习稳定性与预测精度,并增强了策略训练过程中演员-评论家(Actor-Critic)之间的协同效果。 【结果和讨论】 为定量评估各模块的性能影响,设计并开展了多组消融实验。结果表明,各功能模块均能带来独立的性能提升,而其组合使用则在路径规划精度与推理效率方面取得了最优表现。该协调的Actor-Critic设计有效提升了轨迹质量与决策稳定性,这对于序列式机器人采摘任务尤为关键。与传统群体智能算法相比,所提出的AC-RL-PtrNet模型在25个目标采摘任务中实现了-2.63%~61.87%的规划时间提升,在31个目标任务中提升幅度为22.93%~59.10%。同时,不同规划实例下的路径长度均显著缩短,表明该模型在不同问题规模下具备良好的泛化与稳定性能。田间实测结果进一步验证了其实际应用效果:在真实红花采摘环境中,AC-RL-PtrNet在25个目标任务中实现路径长度减少9.56%、时间缩短5.43%;在31个目标任务中路径减少20.17%、时间节省29.70%。 【结论】 本研究提出的基于强化学习的结构化路径规划方法在路径规划效率提升与路线质量优化方面均展现出显著优势,为红花采摘机器人在复杂动态环境下实现高效与稳健的自主采摘提供了切实可行的技术方案,同时为解决三维组合优化类问题提供了新的研究思路与方法论支持。

关键词: 动态指数移动平均机制, 结构化剪枝, 强化学习, 三维旅行商问题, 红花采摘, 机器人

Abstract:

[Objective] There are several critical challenges in automated safflower harvesting, particularly the inefficiencies in path planning, suboptimal route quality, and limited decision-making capability under dynamic and complex environments. To solve these issues, the problem was formulated as a three-dimensional traveling salesman problem and an enhanced reinforcement learning model named actor-critic reinforcement learning pointer network (AC-RL-PtrNet) was proposed, specifically designed for deployment on intelligent safflower picking robots in agricultural settings. [Methods] First, to address the inherent limitations of conventional attention mechanisms in dynamic environments with complex spatial structures, an enhanced attention module was proposed based on the dynamic exponential moving average framework. By combining multi-head attention, spatial distance encoding, and adaptive exponential smoothing, the improved design allowed the model to better capture long-range dependencies and spatial context among safflowers. Meanwhile, to minimize computational cost while preserving inference quality, a structured pruning approach was adopted, which selectively removed redundant connections in the long short-term memory gates and fully connected layers. In parallel, the critic network was redesigned to improve learning stability and accuracy. This was achieved through the inclusion of batch normalization, residual feature aggregation, and a multi-layer value estimation head, all of which contributed to a tighter actor-critic synergy during policy training. [Results and Discussions] To quantitatively assess the impact of each component, ablation experiments were conducted across various configurations. The results confirmed that each module contributed distinct benefits, while their combination yielded the highest improvements in both planning precision and inference efficiency. This coordinated actor-critic design effectively enhanced both trajectory quality and decision stability, which were critical in sequential robotic picking tasks. Experimental results also demonstrated that, compared with traditional swarm intelligence algorithms particle swarm optimization (PSO), ant colony optimization (ACO), and non-dominated sorting genetic algorithm, the proposed AC-RL-PtrNet model achieved a planning time improvement ranging from -2.63% to 61.87% on the 25-target dataset and from 22.93% to 59.1% on the 31-target dataset. Meanwhile, the optimized paths were significantly shortened across different planning instances, indicating robust generalization capability under varied problem scales. Furthermore, field experiments provided concrete validation of the model's practical applicability. When deployed on a mobile picking robot in real safflower fields, the AC-RL-PtrNet achieved a 9.56% reduction in path length and 5.43% time saved for a 25-target picking task, and a 20.17% path reduction and 29.70% time saving for a 31-target scenario involving a different safflower variety. Overall, these results all indicated that the proposed method exhibited significant advantages in enhancing path planning efficiency and optimizing path quality. [Conclusions] This study offers a practical solution for achieving efficient and robust automatic picking by safflower picking robots and provides new insights into solving 3D combinatorial optimization problems.

Key words: dynamic exponential moving average mechanism, structural pruning, reinforcement learning, 3D traveling salesman problem, safflower picking, robot

中图分类号: