Welcome to Smart Agriculture 中文

Smart Agriculture

   

Dexterous Obstacle Avoidance Method for Rigid and Flexible Obstacles of Apple Picking Robots

HAO Jingtao1,2, YUE Youjun1, ZHAO Zhuoqun3, ZHAO Hui1(), LI Tao2()   

  1. 1. School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300380, China
    2. Intelligent Equipment Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
    3. School of Mechanical Engineering, Tianjin Sino-German University of Applied Sciences, Tianjin 300350, China
  • Received:2025-10-13 Online:2026-01-20
  • Foundation items:National Key Research and Development Program of China(2024YFD2000602); Tianjin Science and Technology Support Program(23YFZCSN00290)
  • About author:

    HAO Jingtao, E-mail:

  • corresponding author:
    ZHAO Hui, E-mail: ;
    LI Tao, E-mail:

Abstract:

[Objective] China's apple industry ranks among the world's largest. However, labor costs account high of the total expenses in the harvesting process, and the obstacle avoidance path planning of robotic arms remains a core bottleneck restricting the replacement of manual labor with robots. In orchard environments, rigid obstacles such as traction wires and thick branches are densely intertwined with flexible obstacles including thin twigs and leaves. Traditional path planning algorithms rely on static geometric modeling, making them difficult to adapt to dynamically variable geometric boundaries and leading to problems such as long planning time and poor adaptability. Although existing deep reinforcement learning methods have been applied, they still suffer from limitations like insufficient generalization ability and low training efficiency in the differentiated handling of rigid and flexible obstacles. To address the path planning challenge caused by rigid-flexible hybrid obstacles in unstructured orchards, an obstacle avoidance method based on expert demonstration-guided deep reinforcement learning is proposed, providing theoretical and technical support for the intelligent operation of harvesting robots. [Methods] First, a high-fidelity simulation environment was constructed based on the MuJoCo physics engine, which accurately models a 4-degree-of-freedom robotic arm, a three-fingered gripper, and rigid-flexible hybrid obstacles. The combination of rigid tree trunks and flexible branches modeled via rope models, together with the precise configuration of elasticity and damping parameters, enables the reproduction of realistic physical deformation. The designed state vector contains 425-dimensional information, covering key data such as the end-effector position of the robotic arm, fruit position, and attributes of nearly 50 obstacle segments. The action space is discretized into 8 actions, including gripper opening/closing and six-directional displacement of the end-effector with a step size of 0.01 m. The harvesting task is decomposed into three phases: approaching, grasping, and placing. The simulation environment can automatically complete phase switching and task judgment, and distinguish rigid and flexible obstacles through labels—brown for rigid branches and green for flexible branches. Coupled with collision detection bodies to monitor the interference between the gripper and obstacles, it provides realistic scenario support for algorithm training. In terms of algorithm design, an Expert Demonstration-guided Soft Actor-Critic (EG-SAC) algorithm is proposed. A hierarchical reward function is designed, which assigns rewards according to the distance variation between the end-effector and the target point in each phase, and imposes penalties for collisions, timeouts, and other abnormal situations. A total of 600 expert trajectories are generated through analytical calculation. For flexible occlusion scenarios, branch-poking guide points are designed to guide the robotic arm to approach the target accurately. The policy network adopts a three-layer fully connected neural network. After pre-training via behavioral cloning, it is optimized using the Hindsight Experience Replay HER-SAC algorithm. Kullback-Leibler (KL) divergence regularization is introduced to constrain policy deviation, thereby balancing the robustness of expert experience and the optimization potential of reinforcement learning. Orchard field tests were conducted using the self-developed multi-arm harvesting robot of the research team. The test site was a standard high-density dwarf orchard. The vision system acquired image and depth information via an RGB-D sensor, detected fruits using YOLOv8, and generated 3D point clouds. Three typical scenarios were selected, with 20 fruit samples in each scenario. Scenario 1: simple flexible occlusion, Scenario 2: rigid-flexible hybrid occlusion, and Scenario 3: rigid occlusion. The initial poses of the robotic arm were kept consistent within the same round of tests and randomized across different rounds to verify generalization ability. [Results and Discussions] In simulation experiments, a simulation system comprising a robot mobile platform, a multi-degree-of-freedom robotic arm, and a rigid-flexible hybrid fruit tree model was built based on MuJoCo. This system can realize the random generation of branch poses and fruit positions, and synchronously randomize the initial poses of the robotic arm to ensure the diversity of training samples. Comparative experiments were set up with the EG-SAC experimental group and the standard SAC baseline group. The results show that EG-SAC achieves fast convergence, outperforming the standard SAC significantly with a 75% improvement in convergence speed, and it can drive the robotic arm to implement differentiated obstacle avoidance behaviors. The field test results showed that: the success rates of the EG-SAC algorithm in the three scenarios were 80%, 60%, and 75% respectively, with an average success rate of 71.7%, which was twice that of the RRT algorithm; the average collision rate was only 13.3%, far lower than 81.7% of the RRT algorithm; the operation time ranged from 12.11 to 12.62 s, basically comparable to that of the RRT algorithm. Failure mode analysis indicated that the main cause of RRT algorithm failure was obstacle mis-grasping, accounting for 42.0%, while the failures of the EG-SAC algorithm were mostly attributed to fine-grained control issues. [Conclusions] In conclusion, this study effectively solves the obstacle avoidance challenge in unstructured orchards through the differentiated interaction mechanism for rigid-flexible obstacles, the expert demonstration-guided reinforcement learning framework, as well as high-fidelity simulation and field verification. The research findings provide a feasible solution for the industrial application of harvesting robots. Future work can further improve the response speed through model optimization and hardware acceleration. In addition, the expert-guided framework can be extended to other complex manipulation tasks, which holds significant implications for promoting the practical application of agricultural robots.

Key words: apple harvesting, rigid-flexible mixture, MuJoCo, expert demonstration, reinforcement learning

CLC Number: