欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

苹果采摘机器人刚柔障碍物灵巧避障方法研究

郝静涛1,2, 岳有军1, 赵卓群3, 赵辉1(), 李涛2()   

  1. 1. 天津理工大学 电气工程与自动化学院,天津 300380,中国
    2. 北京市农林科学院/北京市农林科学院智能装备技术研究中心,北京 100097,中国
    3. 天津中德应用技术大学 机械工程学院,天津 300350,中国
  • 收稿日期:2025-10-13 出版日期:2026-01-20
  • 基金项目:
    国家重点研发计划项目(2024YFD2000602); 天津市科技计划支撑项目(23YFZCSN00290)
  • 作者简介:

    郝静涛,硕士研究生,研究方向为智能控制理论。E-mail:

  • 通信作者:
    赵 辉,博士,教授,研究方向为智能农业装备。E-mail:
    李 涛,博士,副研究员,研究方向为果园采摘机器人智能感知与控制。E-mail:

Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots

HAO Jingtao1,2, YUE Youjun1, ZHAO Zhuoqun3, ZHAO Hui1(), LI Tao2()   

  1. 1. School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300380, China
    2. Intelligent Equipment Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
    3. School of Mechanical Engineering, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China
  • Received:2025-10-13 Online:2026-01-20
  • Foundation items:National Key Research and Development Program of China(2024YFD2000602); Tianjin Science and Technology Support Program(23YFZCSN00290)
  • About author:

    HAO Jingtao, E-mail:

  • Corresponding author:
    ZHAO Hui, E-mail: ;
    LI Tao, E-mail:

摘要:

【目的/意义】 由于农业劳动力的急剧减少,机器人采摘已成为苹果产业发展的迫切需求。然而,果园环境具有明显的非结构性和复杂性,导致机器人难以实现无碰撞的精准采摘。为解决非结构化果园环境中障碍物异质性导致的路径规划困难问题,提出了一种基于障碍物刚柔属性识别的机器人灵巧避障方法。 【方法】 通过在强化学习过程中引入刚柔障碍物的差异化交互机制,使机器人能够自主学习针对刚性障碍物的规避策略和柔性枝条的拨动操作,有效改善了传统被动避障方法的适应性不足的问题。建立了融合先验知识引导的深度强化学习算法架构,采用专家演示驱动的软演员评论家(Soft Actor-Critic, SAC)方法,通过相对熵(Kullback-Leibler, KL)散度正则化约束实现策略空间的有界探索,提高了高维连续空间下的收敛性与稳定性。构建了具有真实物理属性的MuJoCo仿真验证平台,完成了从理论建模到工程实践的验证。 【结果和讨论】 所提方法在复杂遮挡场景下采摘成功率达71.7%;专家引导策略使收敛速度提升75%,训练稳定性得到改善。 【结论】 研究为非结构化环境下机器人智能操作提供了理论参考和技术支撑。

关键词: 苹果采摘, 刚柔混杂, MuJoCo, 专家演示, 强化学习

Abstract:

[Objective] China's apple industry ranks among the world's largest. However, labor costs account high of the total expenses in the harvesting process, and the obstacle avoidance path planning of robotic arms remains a core bottleneck restricting the replacement of manual labor with robots. In orchard environments, rigid obstacles such as traction wires and thick branches are densely intertwined with flexible obstacles including thin twigs and leaves. Traditional path planning algorithms rely on static geometric modeling, making them difficult to adapt to dynamically variable geometric boundaries and leading to problems such as long planning time and poor adaptability. Although existing deep reinforcement learning methods have been applied, they still suffer from limitations like insufficient generalization ability and low training efficiency in the differentiated handling of rigid and flexible obstacles. To address the path planning challenge caused by rigid-flexible hybrid obstacles in unstructured orchards, an obstacle avoidance method based on expert demonstration-guided deep reinforcement learning is proposed, providing theoretical and technical support for the intelligent operation of harvesting robots. [Methods] First, a high-fidelity simulation environment was constructed based on the MuJoCo physics engine, which accurately models a 4-degree-of-freedom robotic arm, a three-fingered gripper, and rigid-flexible hybrid obstacles. The combination of rigid tree trunks and flexible branches modeled via rope models, together with the precise configuration of elasticity and damping parameters, enables the reproduction of realistic physical deformation. The designed state vector contains 425-dimensional information, covering key data such as the end-effector position of the robotic arm, fruit position, and attributes of nearly 50 obstacle segments. The action space is discretized into 8 actions, including gripper opening/closing and six-directional displacement of the end-effector with a step size of 0.01 m. The harvesting task is decomposed into three phases: approaching, grasping, and placing. The simulation environment can automatically complete phase switching and task judgment, and distinguish rigid and flexible obstacles through labels—brown for rigid branches and green for flexible branches. Coupled with collision detection bodies to monitor the interference between the gripper and obstacles, it provides realistic scenario support for algorithm training. In terms of algorithm design, an Expert Demonstration-guided Soft Actor-Critic (EG-SAC) algorithm is proposed. A hierarchical reward function is designed, which assigns rewards according to the distance variation between the end-effector and the target point in each phase, and imposes penalties for collisions, timeouts, and other abnormal situations. A total of 600 expert trajectories are generated through analytical calculation. For flexible occlusion scenarios, branch-poking guide points are designed to guide the robotic arm to approach the target accurately. The policy network adopts a three-layer fully connected neural network. After pre-training via behavioral cloning, it is optimized using the Hindsight Experience Replay HER-SAC algorithm. Kullback-Leibler (KL) divergence regularization is introduced to constrain policy deviation, thereby balancing the robustness of expert experience and the optimization potential of reinforcement learning. Orchard field tests were conducted using the self-developed multi-arm harvesting robot of the research team. The test site was a standard high-density dwarf orchard. The vision system acquired image and depth information via an RGB-D sensor, detected fruits using YOLOv8, and generated 3D point clouds. Three typical scenarios were selected, with 20 fruit samples in each scenario. Scenario 1: simple flexible occlusion, Scenario 2: rigid-flexible hybrid occlusion, and Scenario 3: rigid occlusion. The initial poses of the robotic arm were kept consistent within the same round of tests and randomized across different rounds to verify generalization ability. [Results and Discussions] In simulation experiments, a simulation system comprising a robot mobile platform, a multi-degree-of-freedom robotic arm, and a rigid-flexible hybrid fruit tree model was built based on MuJoCo. This system can realize the random generation of branch poses and fruit positions, and synchronously randomize the initial poses of the robotic arm to ensure the diversity of training samples. Comparative experiments were set up with the EG-SAC experimental group and the standard SAC baseline group. The results show that EG-SAC achieves fast convergence, outperforming the standard SAC significantly with a 75% improvement in convergence speed, and it can drive the robotic arm to implement differentiated obstacle avoidance behaviors. The field test results showed that: the success rates of the EG-SAC algorithm in the three scenarios were 80%, 60%, and 75% respectively, with an average success rate of 71.7%, which was twice that of the RRT algorithm; the average collision rate was only 13.3%, far lower than 81.7% of the RRT algorithm; the operation time ranged from 12.11 to 12.62 s, basically comparable to that of the RRT algorithm. Failure mode analysis indicated that the main cause of RRT algorithm failure was obstacle mis-grasping, accounting for 42.0%, while the failures of the EG-SAC algorithm were mostly attributed to fine-grained control issues. [Conclusions] In conclusion, this study effectively solves the obstacle avoidance challenge in unstructured orchards through the differentiated interaction mechanism for rigid-flexible obstacles, the expert demonstration-guided reinforcement learning framework, as well as high-fidelity simulation and field verification. The research findings provide a feasible solution for the industrial application of harvesting robots. Future work can further improve the response speed through model optimization and hardware acceleration. In addition, the expert-guided framework can be extended to other complex manipulation tasks, which holds significant implications for promoting the practical application of agricultural robots.

Key words: apple harvesting, rigid-flexible mixture, MuJoCo, expert demonstration, reinforcement learning

中图分类号: