Welcome to Smart Agriculture 中文

Smart Agriculture

   

Research on a Lightweight Apple Instance Segmentation Algorithm Based on SSW-YOLOv11n for Complex Orchard Environments

HAN Wenkai1, LI Tao2, FENG Qingchun2, CHEN Liping1,2()   

  1. 1. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100 China
    2. Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
  • Received:2025-05-03 Online:2025-09-23
  • Foundation items:National Key Research and Development Program of China(2024YFD2000602); Science and Technology Program of Tianjin, China(23YFZCSN00290); Youth Research Foundation of Beijing Academy of Agriculture and Forestry Sciences, China(QNJJ202318); Beijing Nova Program, China(20220484023)
  • About author:

    HAN Wenkai, E-mail:

  • corresponding author:
    CHEN Liping, E-mail:

Abstract:

[Objective] In complex orchard environments, accurate fruit detection and segmentation is critical for autonomous apple-picking robots. Environmental factors severely degrade fruit visibility, challenging instance segmentation models across diverse field conditions. Apple-picking robots operate on embedded edge-computing platforms with stringent constraints on processing power, memory, and energy consumption. Limited computational resources preclude high-complexity deep-learning architectures, requiring segmentation models to balance real-time throughput and resource efficiency. This study introduces SSW-YOLOv11n, a lightweight instance segmentation model derived from YOLOv11n and tailored to orchard environments. SSW-YOLOv11n maintains high mask accuracy under adverse conditions—variable lighting, irregular occlusion, and background clutter—while delivering accelerated inference on resource-limited edge devices through three core design enhancements. [Methods] The SSW-YOLOv11n model first introduced GSConv and VoVGSCSP modules into its neck network, thereby constructing a highly compact yet computationally efficient "Slim-Neck" architecture. By integrating GSConv—an operation that employs grouped spatial convolutions and channel-shuffle techniques—and VoVGSCSP—a cross-stage partial module optimized for balanced depth and width—the model substantially reduced its overall floating-point operations while concurrently enhancing the richness of its feature representations. This optimized neck design facilitated more effective multi-scale information fusion, ensuring that semantic features corresponding to target regions were extracted comprehensively, all without compromising the model's lightweight nature. Subsequently, the authors embedded the SimAM self-attention mechanism at multiple output interfaces between the backbone and neck subnets. SimAM leveraged a parameter-free energy-based weighting strategy to dynamically amplify critical feature responses and suppress irrelevant background activations, thereby augmenting the model's sensitivity to fruit targets amid complex, cluttered orchard scenes. Finally, the original bounding-box regression loss was replaced with Wise-IoU, which incorporated a dynamic weighting scheme based on both center-point distance and geometric discrepancy factors. This modification further refined the regression process, improving localization precision and stability under variable environmental conditions. Collectively, these three innovations synergistically endowed the model with superior instance-segmentation performance and deployment adaptability, offering a transferable design paradigm for implementing deep-learning-based vision systems on resource-constrained agricultural robots. [Results and Discussions] Experimental results demonstrated that SSW-YOLOv11n achieved Box mAP50 and Mask mAP50 of 76.3% and 76.7%, respectively, representing improvements of 1.7 and 2.4 percent point over the baseline YOLOv11n model. The proposed model reduced computational complexity from 10.4 GFLOPs to 9.1 GFLOPs (12.5% reduction) and achieved a model weight of 4.55 MB compared to 5.89 MB for the baseline (22.8% reduction), demonstrating significant efficiency gains. These results indicate that the synergistic integration of lightweight architecture design and attention mechanisms effectively addresses the trade-off between model complexity and segmentation accuracy. Comparative experiments showed that SSW-YOLOv11n outperformed Mask R-CNN, SOLO, YOLACT, and YOLOv11n with Mask mAP50 improvements of 23.2, 20.3, 21.4, and 2.4 percent point, respectively, evidencing substantial advantages in segmentation precision within unstructured orchard environments. The superior performance over traditional methods suggests that the proposed approach successfully adapts deep learning architectures to agricultural scenarios with complex environmental conditions. Edge deployment testing on NVIDIA Jetson TX2 platform achieved 29.8 FPS inference rate, representing an 18.7% improvement over YOLOv11n (25.1 FPS), validating the model's real-time performance and suitability for resource-constrained agricultural robotics applications. [Conclusions] SSW-YOLOv11n effectively enhanced fruit-target segmentation accuracy while reducing computational overhead, thus providing a robust technical foundation for the practical application of autonomous apple-picking robots. By addressing the dual imperatives of high-precision perception and efficient inference within constrained hardware contexts, the proposed approach advanced the state of the art in intelligent agricultural robotics and offered a scalable solution for large-scale orchard automation.

Key words: deep learning, apple harvesting, lightweight design, instance segmentation, YOLOv11

CLC Number: