苹果采摘机器人刚柔障碍物灵巧避障方法研究

doi:10.12133/j.smartag.SA202510012

Smart Agriculture

• •

苹果采摘机器人刚柔障碍物灵巧避障方法研究

郝静涛¹^,², 岳有军¹, 赵卓群³, 赵辉¹(), 李涛²()

^1. 天津理工大学电气工程与自动化学院，天津 300380，中国
^2. 北京市农林科学院/北京市农林科学院智能装备技术研究中心，北京 100097，中国
^3. 天津中德应用技术大学机械工程学院，天津 300350，中国

收稿日期:2025-10-13 出版日期:2026-01-20
基金项目:
国家重点研发计划项目(2024YFD2000602); 天津市科技计划支撑项目(23YFZCSN00290)
作者简介:
郝静涛，硕士研究生，研究方向为智能控制理论。E-mail： 18234620551@163.com
通信作者:
赵辉，博士，教授，研究方向为智能农业装备。E-mail： zhaohui3379@126.com；
李涛，博士，副研究员，研究方向为果园采摘机器人智能感知与控制。E-mail： lit@nercita.org.cn

Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots

HAO Jingtao¹^,², YUE Youjun¹, ZHAO Zhuoqun³, ZHAO Hui¹(), LI Tao²()

^1. School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300380, China
^2. Intelligent Equipment Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
^3. School of Mechanical Engineering, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China

Received:2025-10-13 Online:2026-01-20
Foundation items:National Key Research and Development Program of China(2024YFD2000602); Tianjin Science and Technology Support Program(23YFZCSN00290)
About author:
HAO Jingtao, E-mail: 18234620551@163.com
Corresponding author:
ZHAO Hui, E-mail: zhaohui3379@126.com;
LI Tao, E-mail: lit@nercita.org.cn

摘要/Abstract

摘要：

【目的/意义】 由于农业劳动力的急剧减少，机器人采摘已成为苹果产业发展的迫切需求。然而，果园环境具有明显的非结构性和复杂性，导致机器人难以实现无碰撞的精准采摘。为解决非结构化果园环境中障碍物异质性导致的路径规划困难问题，提出了一种基于障碍物刚柔属性识别的机器人灵巧避障方法。 【方法】 通过在强化学习过程中引入刚柔障碍物的差异化交互机制，使机器人能够自主学习针对刚性障碍物的规避策略和柔性枝条的拨动操作，有效改善了传统被动避障方法的适应性不足的问题。建立了融合先验知识引导的深度强化学习算法架构，采用专家演示驱动的软演员评论家（Soft Actor-Critic, SAC）方法，通过相对熵（Kullback-Leibler， KL）散度正则化约束实现策略空间的有界探索，提高了高维连续空间下的收敛性与稳定性。构建了具有真实物理属性的MuJoCo仿真验证平台，完成了从理论建模到工程实践的验证。 【结果和讨论】 所提方法在复杂遮挡场景下采摘成功率达71.7%；专家引导策略使收敛速度提升75%，训练稳定性得到改善。 【结论】 研究为非结构化环境下机器人智能操作提供了理论参考和技术支撑。

关键词: 苹果采摘, 刚柔混杂, MuJoCo, 专家演示, 强化学习

Abstract:

[Objective] China's apple industry ranks among the world's largest. However, labor costs account high of the total expenses in the harvesting process, and the obstacle avoidance path planning of robotic arms remains a core bottleneck restricting the replacement of manual labor with robots. In orchard environments, rigid obstacles such as traction wires and thick branches are densely intertwined with flexible obstacles including thin twigs and leaves. Traditional path planning algorithms rely on static geometric modeling, making them difficult to adapt to dynamically variable geometric boundaries and leading to problems such as long planning time and poor adaptability. Although existing deep reinforcement learning methods have been applied, they still suffer from limitations like insufficient generalization ability and low training efficiency in the differentiated handling of rigid and flexible obstacles. To address the path planning challenge caused by rigid-flexible hybrid obstacles in unstructured orchards, an obstacle avoidance method based on expert demonstration-guided deep reinforcement learning is proposed, providing theoretical and technical support for the intelligent operation of harvesting robots. [Methods] First, a high-fidelity simulation environment was constructed based on the MuJoCo physics engine, which accurately models a 4-degree-of-freedom robotic arm, a three-fingered gripper, and rigid-flexible hybrid obstacles. The combination of rigid tree trunks and flexible branches modeled via rope models, together with the precise configuration of elasticity and damping parameters, enables the reproduction of realistic physical deformation. The designed state vector contains 425-dimensional information, covering key data such as the end-effector position of the robotic arm, fruit position, and attributes of nearly 50 obstacle segments. The action space is discretized into 8 actions, including gripper opening/closing and six-directional displacement of the end-effector with a step size of 0.01 m. The harvesting task is decomposed into three phases: approaching, grasping, and placing. The simulation environment can automatically complete phase switching and task judgment, and distinguish rigid and flexible obstacles through labels—brown for rigid branches and green for flexible branches. Coupled with collision detection bodies to monitor the interference between the gripper and obstacles, it provides realistic scenario support for algorithm training. In terms of algorithm design, an Expert Demonstration-guided Soft Actor-Critic (EG-SAC) algorithm is proposed. A hierarchical reward function is designed, which assigns rewards according to the distance variation between the end-effector and the target point in each phase, and imposes penalties for collisions, timeouts, and other abnormal situations. A total of 600 expert trajectories are generated through analytical calculation. For flexible occlusion scenarios, branch-poking guide points are designed to guide the robotic arm to approach the target accurately. The policy network adopts a three-layer fully connected neural network. After pre-training via behavioral cloning, it is optimized using the Hindsight Experience Replay HER-SAC algorithm. Kullback-Leibler (KL) divergence regularization is introduced to constrain policy deviation, thereby balancing the robustness of expert experience and the optimization potential of reinforcement learning. Orchard field tests were conducted using the self-developed multi-arm harvesting robot of the research team. The test site was a standard high-density dwarf orchard. The vision system acquired image and depth information via an RGB-D sensor, detected fruits using YOLOv8, and generated 3D point clouds. Three typical scenarios were selected, with 20 fruit samples in each scenario. Scenario 1: simple flexible occlusion, Scenario 2: rigid-flexible hybrid occlusion, and Scenario 3: rigid occlusion. The initial poses of the robotic arm were kept consistent within the same round of tests and randomized across different rounds to verify generalization ability. [Results and Discussions] In simulation experiments, a simulation system comprising a robot mobile platform, a multi-degree-of-freedom robotic arm, and a rigid-flexible hybrid fruit tree model was built based on MuJoCo. This system can realize the random generation of branch poses and fruit positions, and synchronously randomize the initial poses of the robotic arm to ensure the diversity of training samples. Comparative experiments were set up with the EG-SAC experimental group and the standard SAC baseline group. The results show that EG-SAC achieves fast convergence, outperforming the standard SAC significantly with a 75% improvement in convergence speed, and it can drive the robotic arm to implement differentiated obstacle avoidance behaviors. The field test results showed that: the success rates of the EG-SAC algorithm in the three scenarios were 80%, 60%, and 75% respectively, with an average success rate of 71.7%, which was twice that of the RRT algorithm; the average collision rate was only 13.3%, far lower than 81.7% of the RRT algorithm; the operation time ranged from 12.11 to 12.62 s, basically comparable to that of the RRT algorithm. Failure mode analysis indicated that the main cause of RRT algorithm failure was obstacle mis-grasping, accounting for 42.0%, while the failures of the EG-SAC algorithm were mostly attributed to fine-grained control issues. [Conclusions] In conclusion, this study effectively solves the obstacle avoidance challenge in unstructured orchards through the differentiated interaction mechanism for rigid-flexible obstacles, the expert demonstration-guided reinforcement learning framework, as well as high-fidelity simulation and field verification. The research findings provide a feasible solution for the industrial application of harvesting robots. Future work can further improve the response speed through model optimization and hardware acceleration. In addition, the expert-guided framework can be extended to other complex manipulation tasks, which holds significant implications for promoting the practical application of agricultural robots.

Key words: apple harvesting, rigid-flexible mixture, MuJoCo, expert demonstration, reinforcement learning

中图分类号:

S24
TP242

郝静涛, 岳有军, 赵卓群, 赵辉, 李涛. 苹果采摘机器人刚柔障碍物灵巧避障方法研究[J]. 智慧农业(中英文), doi: 10.12133/j.smartag.SA202510012.

HAO Jingtao, YUE Youjun, ZHAO Zhuoqun, ZHAO Hui, LI Tao. Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202510012.

图/表 18

图1

图2

图3

图4

图5

图6

图7

表1

图8

图9

图10

图11

图12

图13

图14

表2

表3

表4

参考文献 20

[1]	赵春江, 范贝贝, 李瑾, 等. 农业机器人技术进展、挑战与趋势[J]. 智慧农业(中英文), 2023(4): 1-15.
	ZHAO C J, FAN B B, LI J, et al. Agricultural robots: Technology progress, challenges and trends[J]. Smart agriculture, 2023(4): 1-15.
[2]	JIN Y C, LIU J Z, XU Z J, et al. Development status and trend of agricultural robot technology[J]. International journal of agricultural and biological engineering, 2021, 14(3): 1-19.
[3]	TANG Y, DANANJAYAN S, HOU C J, et al. A survey on the 5G network and its impact on agriculture: Challenges and opportunities[J]. Computers and electronics in agriculture, 2021, 180: ID 105895.
[4]	GONGAL A, AMATYA S, KARKEE M, et al. Sensors and systems for fruit detection and localization: A review[J]. Computers and electronics in agriculture, 2015, 116: 8-19.
[5]	KIM W S, LEE D H, KIM Y J, et al. Path detection for autonomous traveling in orchards using patch-based CNN[J]. Computers and electronics in agriculture, 2020, 175: ID 105620.
[6]	GUO Z W, FU H, WU J H, et al. Dynamic task planning for multi-arm apple-harvesting robots using LSTM-PPO reinforcement learning algorithm[J]. Agriculture, 2025, 15(6): ID 588.
[7]	HUA W J, ZHANG Z, ZHANG W Q, et al. Key technologies in apple harvesting robot for standardized orchards: A comprehensive review of innovations, challenges, and future directions[J]. Computers and electronics in agriculture, 2025, 235: ID 110343.
[8]	CAO X M, ZOU X J, JIA C Y, et al. RRT-based path planning for an intelligent litchi-picking manipulator[J]. Computers and electronics in agriculture, 2019, 156: 105-118.
[9]	YE L, DUAN J L, YANG Z, et al. Collision-free motion planning for the litchi-picking robot[J]. Computers and electronics in agriculture, 2021, 185: ID 106151.
[10]	LUO L F, WEN H J, LU Q H, et al. Collision-free path-planning for six-DOF serial harvesting robot based on energy optimal and artificial potential field[J]. Complexity, 2018, 2018: 1-12.
[11]	ROSTAMI S M H, SANGAIAH A K, WANG J, et al. Obstacle avoidance of mobile robots using modified artificial potential field algorithm[J]. EURASIP journal on wireless communications and networking, 2019, 2019(1): ID 70.
[12]	BAC C W, ROORDA T, RESHEF R, et al. Analysis of a motion planning problem for sweet-pepper harvesting in a dense obstacle environment[J]. Biosystems engineering, 2016, 146: 85-97.
[13]	LI Y J, FENG Q C, ZHANG Y F, et al. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot[J]. Computers and electronics in agriculture, 2024, 216: ID 108488.
[14]	WANG Y C, HE Z, CAO D D, et al. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning[J]. Computers and electronics in agriculture, 2023, 205: ID 107593.
[15]	LIN G C, ZHU L X, LI J H, et al. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning[J]. Computers and electronics in agriculture, 2021, 188: ID 106350.
[16]	SUN J H, FENG Q C, ZHANG Y F, et al. Fruit flexible collecting trajectory planning based on manual skill imitation for grape harvesting robot[J]. Computers and electronics in agriculture, 2024, 225: ID 109332.
[17]	LI T, XIE F, ZHAO Z Q, et al. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control[J]. Computers and electronics in agriculture, 2023, 211: ID 107979.
[18]	YANDUN F, PARHAR T, SILWAL A, et al. Reaching pruning locations in a vine using a deep reinforcement learning policy[C]// 2021 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, New Jersey, USA:IEEE, 2021: 2400-2406.
[19]	LIU Y Q, GAO P, ZHENG C G, et al. A deep reinforcement learning strategy combining expert experience guidance for a fruit-picking manipulator[J]. Electronics, 2022, 11(3): ID 311.
[20]	LI T, XIE F, QIU Q, et al. Multi-arm robot task planning for fruit harvesting using multi-agent reinforcement learning[C]// 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, New Jersey, USA: IEEE, 2023: 4176-4183.

Parameter	Value
Actionsize	8
Observationsize	425
Networksize	512 512
Batchsize	1 024
Buffersize	10⁶
Learningrate	0.001
Trainingsteps	600

模型	场景	抓果次数/次	成功数/次	抓空数/次	抓到障碍物数/次	抓住后掉落数/次	对准抓失败数/次	与刚性碰撞数/次	放弃抓果数/次	平均耗时/s
EG-SAC	场景一	20	16	1	3	2	1	0	0	12.11
	场景二	20	12	1	4	2	2	0	3	12.24
	场景三	20	15	0	1	1	0	1	4	12.62
RRT	场景一	20	7	1	17	2	6	0	4	11.23
	场景二	20	6	1	11	2	2	3	9	12.18
	场景三	20	8	2	6	4	1	13	5	11.64

失效类型	EG-SAC出现次数/次	RRT出现次数/次	主要原因分析
抓空	2	4	果实定位误差/手爪精度不足
抓到障碍物	8	34	路径规划不当/拨枝时摩擦力不足拨枝失败
抓住后掉落	5	8	夹持力控制问题/被障碍物阻挡
对准抓失败	3	9	连同枝条一起抓到导致抓取失败
刚性碰撞	1	16	避障策略失效
放弃抓果	7	18	安全策略触发

苹果采摘机器人刚柔障碍物灵巧避障方法研究

Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots

在线阅读

知网下载

本地下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 20

相关文章 1

编辑推荐

Metrics

本文评价

方法	指标类型	场景一	场景二	场景三	平均值
EG-SAC	成功率/%	80	60	75	71.7
EG-SAC	碰撞率/%	10	20	10	13.3
RRT	成功率/%	35	30	40	35.0
RRT	碰撞率/%	85	70	90	81.7