Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots

doi:10.12133/j.smartag.SA202510012

Abstract

Abstract:

[Objective] China's apple industry ranks among the world's largest. However, labor costs account high of the total expenses in the harvesting process, and the obstacle avoidance path planning of robotic arms remains a core bottleneck restricting the replacement of manual labor with robots. In orchard environments, rigid obstacles such as traction wires and thick branches are densely intertwined with flexible obstacles including thin twigs and leaves. Traditional path planning algorithms rely on static geometric modeling, making them difficult to adapt to dynamically variable geometric boundaries and leading to problems such as long planning time and poor adaptability. Although existing deep reinforcement learning methods have been applied, they still suffer from limitations like insufficient generalization ability and low training efficiency in the differentiated handling of rigid and flexible obstacles. To address the path planning challenge caused by rigid-flexible hybrid obstacles in unstructured orchards, an obstacle avoidance method based on expert demonstration-guided deep reinforcement learning is proposed, providing theoretical and technical support for the intelligent operation of harvesting robots. [Methods] First, a high-fidelity simulation environment was constructed based on the MuJoCo physics engine, which accurately models a 4-degree-of-freedom robotic arm, a three-fingered gripper, and rigid-flexible hybrid obstacles. The combination of rigid tree trunks and flexible branches modeled via rope models, together with the precise configuration of elasticity and damping parameters, enables the reproduction of realistic physical deformation. The designed state vector contains 425-dimensional information, covering key data such as the end-effector position of the robotic arm, fruit position, and attributes of nearly 50 obstacle segments. The action space is discretized into 8 actions, including gripper opening/closing and six-directional displacement of the end-effector with a step size of 0.01 m. The harvesting task is decomposed into three phases: approaching, grasping, and placing. The simulation environment can automatically complete phase switching and task judgment, and distinguish rigid and flexible obstacles through labels—brown for rigid branches and green for flexible branches. Coupled with collision detection bodies to monitor the interference between the gripper and obstacles, it provides realistic scenario support for algorithm training. In terms of algorithm design, an Expert Demonstration-guided Soft Actor-Critic (EG-SAC) algorithm is proposed. A hierarchical reward function is designed, which assigns rewards according to the distance variation between the end-effector and the target point in each phase, and imposes penalties for collisions, timeouts, and other abnormal situations. A total of 600 expert trajectories are generated through analytical calculation. For flexible occlusion scenarios, branch-poking guide points are designed to guide the robotic arm to approach the target accurately. The policy network adopts a three-layer fully connected neural network. After pre-training via behavioral cloning, it is optimized using the Hindsight Experience Replay HER-SAC algorithm. Kullback-Leibler (KL) divergence regularization is introduced to constrain policy deviation, thereby balancing the robustness of expert experience and the optimization potential of reinforcement learning. Orchard field tests were conducted using the self-developed multi-arm harvesting robot of the research team. The test site was a standard high-density dwarf orchard. The vision system acquired image and depth information via an RGB-D sensor, detected fruits using YOLOv8, and generated 3D point clouds. Three typical scenarios were selected, with 20 fruit samples in each scenario. Scenario 1: simple flexible occlusion, Scenario 2: rigid-flexible hybrid occlusion, and Scenario 3: rigid occlusion. The initial poses of the robotic arm were kept consistent within the same round of tests and randomized across different rounds to verify generalization ability. [Results and Discussions] In simulation experiments, a simulation system comprising a robot mobile platform, a multi-degree-of-freedom robotic arm, and a rigid-flexible hybrid fruit tree model was built based on MuJoCo. This system can realize the random generation of branch poses and fruit positions, and synchronously randomize the initial poses of the robotic arm to ensure the diversity of training samples. Comparative experiments were set up with the EG-SAC experimental group and the standard SAC baseline group. The results show that EG-SAC achieves fast convergence, outperforming the standard SAC significantly with a 75% improvement in convergence speed, and it can drive the robotic arm to implement differentiated obstacle avoidance behaviors. The field test results showed that: the success rates of the EG-SAC algorithm in the three scenarios were 80%, 60%, and 75% respectively, with an average success rate of 71.7%, which was twice that of the RRT algorithm; the average collision rate was only 13.3%, far lower than 81.7% of the RRT algorithm; the operation time ranged from 12.11 to 12.62 s, basically comparable to that of the RRT algorithm. Failure mode analysis indicated that the main cause of RRT algorithm failure was obstacle mis-grasping, accounting for 42.0%, while the failures of the EG-SAC algorithm were mostly attributed to fine-grained control issues. [Conclusions] In conclusion, this study effectively solves the obstacle avoidance challenge in unstructured orchards through the differentiated interaction mechanism for rigid-flexible obstacles, the expert demonstration-guided reinforcement learning framework, as well as high-fidelity simulation and field verification. The research findings provide a feasible solution for the industrial application of harvesting robots. Future work can further improve the response speed through model optimization and hardware acceleration. In addition, the expert-guided framework can be extended to other complex manipulation tasks, which holds significant implications for promoting the practical application of agricultural robots.

Key words: apple harvesting, rigid-flexible mixture, MuJoCo, expert demonstration, reinforcement learning

CLC Number:

S24
TP242

HAO Jingtao, YUE Youjun, ZHAO Zhuoqun, ZHAO Hui, LI Tao. Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202510012.

Figures/Tables 18

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Table 1

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 13

Fig. 14

Table 2

Table 3

Table 4

References 20

[1]	赵春江, 范贝贝, 李瑾, 等. 农业机器人技术进展、挑战与趋势[J]. 智慧农业(中英文), 2023(4): 1-15.
	ZHAO C J, FAN B B, LI J, et al. Agricultural robots: Technology progress, challenges and trends[J]. Smart agriculture, 2023(4): 1-15.
[2]	JIN Y C, LIU J Z, XU Z J, et al. Development status and trend of agricultural robot technology[J]. International journal of agricultural and biological engineering, 2021, 14(3): 1-19.
[3]	TANG Y, DANANJAYAN S, HOU C J, et al. A survey on the 5G network and its impact on agriculture: Challenges and opportunities[J]. Computers and electronics in agriculture, 2021, 180: ID 105895.
[4]	GONGAL A, AMATYA S, KARKEE M, et al. Sensors and systems for fruit detection and localization: A review[J]. Computers and electronics in agriculture, 2015, 116: 8-19.
[5]	KIM W S, LEE D H, KIM Y J, et al. Path detection for autonomous traveling in orchards using patch-based CNN[J]. Computers and electronics in agriculture, 2020, 175: ID 105620.
[6]	GUO Z W, FU H, WU J H, et al. Dynamic task planning for multi-arm apple-harvesting robots using LSTM-PPO reinforcement learning algorithm[J]. Agriculture, 2025, 15(6): ID 588.
[7]	HUA W J, ZHANG Z, ZHANG W Q, et al. Key technologies in apple harvesting robot for standardized orchards: A comprehensive review of innovations, challenges, and future directions[J]. Computers and electronics in agriculture, 2025, 235: ID 110343.
[8]	CAO X M, ZOU X J, JIA C Y, et al. RRT-based path planning for an intelligent litchi-picking manipulator[J]. Computers and electronics in agriculture, 2019, 156: 105-118.
[9]	YE L, DUAN J L, YANG Z, et al. Collision-free motion planning for the litchi-picking robot[J]. Computers and electronics in agriculture, 2021, 185: ID 106151.
[10]	LUO L F, WEN H J, LU Q H, et al. Collision-free path-planning for six-DOF serial harvesting robot based on energy optimal and artificial potential field[J]. Complexity, 2018, 2018: 1-12.
[11]	ROSTAMI S M H, SANGAIAH A K, WANG J, et al. Obstacle avoidance of mobile robots using modified artificial potential field algorithm[J]. EURASIP journal on wireless communications and networking, 2019, 2019(1): ID 70.
[12]	BAC C W, ROORDA T, RESHEF R, et al. Analysis of a motion planning problem for sweet-pepper harvesting in a dense obstacle environment[J]. Biosystems engineering, 2016, 146: 85-97.
[13]	LI Y J, FENG Q C, ZHANG Y F, et al. Peduncle collision-free grasping based on deep reinforcement learning for tomato harvesting robot[J]. Computers and electronics in agriculture, 2024, 216: ID 108488.
[14]	WANG Y C, HE Z, CAO D D, et al. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning[J]. Computers and electronics in agriculture, 2023, 205: ID 107593.
[15]	LIN G C, ZHU L X, LI J H, et al. Collision-free path planning for a guava-harvesting robot based on recurrent deep reinforcement learning[J]. Computers and electronics in agriculture, 2021, 188: ID 106350.
[16]	SUN J H, FENG Q C, ZHANG Y F, et al. Fruit flexible collecting trajectory planning based on manual skill imitation for grape harvesting robot[J]. Computers and electronics in agriculture, 2024, 225: ID 109332.
[17]	LI T, XIE F, ZHAO Z Q, et al. A multi-arm robot system for efficient apple harvesting: Perception, task plan and control[J]. Computers and electronics in agriculture, 2023, 211: ID 107979.
[18]	YANDUN F, PARHAR T, SILWAL A, et al. Reaching pruning locations in a vine using a deep reinforcement learning policy[C]// 2021 IEEE International Conference on Robotics and Automation (ICRA). Piscataway, New Jersey, USA:IEEE, 2021: 2400-2406.
[19]	LIU Y Q, GAO P, ZHENG C G, et al. A deep reinforcement learning strategy combining expert experience guidance for a fruit-picking manipulator[J]. Electronics, 2022, 11(3): ID 311.
[20]	LI T, XIE F, QIU Q, et al. Multi-arm robot task planning for fruit harvesting using multi-agent reinforcement learning[C]// 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway, New Jersey, USA: IEEE, 2023: 4176-4183.

Parameter	Value
Actionsize	8
Observationsize	425
Networksize	512 512
Batchsize	1 024
Buffersize	10⁶
Learningrate	0.001
Trainingsteps	600

模型	场景	抓果次数/次	成功数/次	抓空数/次	抓到障碍物数/次	抓住后掉落数/次	对准抓失败数/次	与刚性碰撞数/次	放弃抓果数/次	平均耗时/s
EG-SAC	场景一	20	16	1	3	2	1	0	0	12.11
	场景二	20	12	1	4	2	2	0	3	12.24
	场景三	20	15	0	1	1	0	1	4	12.62
RRT	场景一	20	7	1	17	2	6	0	4	11.23
	场景二	20	6	1	11	2	2	3	9	12.18
	场景三	20	8	2	6	4	1	13	5	11.64

失效类型	EG-SAC出现次数/次	RRT出现次数/次	主要原因分析
抓空	2	4	果实定位误差/手爪精度不足
抓到障碍物	8	34	路径规划不当/拨枝时摩擦力不足拨枝失败
抓住后掉落	5	8	夹持力控制问题/被障碍物阻挡
对准抓失败	3	9	连同枝条一起抓到导致抓取失败
刚性碰撞	1	16	避障策略失效
放弃抓果	7	18	安全策略触发