欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2025, Vol. 7 ›› Issue (5): 114-123.doi: 10.12133/j.smartag.SA202505002

• 专刊--光智农业创新技术与应用 • 上一篇    

面向复杂果园环境的改进YOLOv11n苹果轻量化实例分割算法

韩文凯1, 李涛2, 冯青春2, 陈立平1,2()   

  1. 1. 西北农林科技大学 机械与电子工程学院,陕西杨凌 712100,中国
    2. 北京市农林科学院智能装备技术研究中心,北京 100097,中国
  • 收稿日期:2025-05-03 出版日期:2025-09-30
  • 基金项目:
    国家重点研发计划项目(2024YFD2000602); 天津市科技计划项目(23YFZCSN00290); 北京市农林科学院青年科研基金项目(QNJJ202318); 北京市科技新星计划项目(20220484023)
  • 作者简介:

    韩文凯,硕士研究生,研究方向为计算机视觉。E-mail:

  • 通信作者:
    陈立平,博士,研究员,研究方向为精准农业技术与装备。E-mail:

Lightweight Apple Instance Segmentation Algorithm Based on SSW-YOLOv11n for Complex Orchard Environments

HAN Wenkai1, LI Tao2, FENG Qingchun2, CHEN Liping1,2()   

  1. 1. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China
    2. Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
  • Received:2025-05-03 Online:2025-09-30
  • Foundation items:National Key Research and Development Program of China(2024YFD2000602); Science and Technology Program of Tianjin(23YFZCSN00290); Youth Research Foundation of Beijing Academy of Agriculture and Forestry Sciences(QNJJ202318); Beijing Nova Program(20220484023)
  • About author:

    HAN Wenkai, E-mail:

  • Corresponding author:
    CHEN Liping, E-mail:

摘要:

[目的/意义] 复杂果园环境中果实目标的精确识别,是苹果采摘机器人作业的关键前提。然而,果园环境中不同的光照环境和遮挡情况对模型的泛化能力提出了严峻要求;同时机器人边缘端计算平台有限的算力资源,对视觉分割模型的计算效率也提出挑战。为此,本研究基于YOLOv11n架构提出一种轻量化实例分割模型SSW-YOLOv11n,提升在差异工况下果实掩膜的分割精度及边缘侧的推理效率。 [方法] 首先,SSW-YOLOv11n模型在颈部网络引入分组混洗卷积(Group Shuffle Convolution, GSConv)和VoV群组混洗CSP模块(Variety of VoV with Group Standard Cross Stage Partial, VoVGSCSP),实现了在降低计算量的同时提升对模型精度和特征表达能力,构建轻量化、高效融合特征的Slim-Neck结构;其次,在骨干网络与颈部网络连接的三个输出端引入简单无参数注意力模块(Simple, Parameter-Free Attention Module, SimAM),对前向传播的特征进行加权处理,增强模型对关键区域的感知能力;最后,采用智能交并比损失函数(Wise Intersection over Union Loss, Wise-IoU)替代原始损失函数,通过引入距离与几何因素的综合权重调节机制,实现对边界框的有效优化。 [结果和讨论] 实验结果表明,相较于原始YOLOv11n模型,SSW-YOLOv11n在Box mAP50和Mask mAP50上分别提升了1.7和2.4个百分点,计算量和模型权重分别减少了12.5%和22.8%;模型在边缘侧NVIDIA Jetson TX2平台推理帧率可达29.8 FPS,相较于YOLOv11n提升了18.7%。 [结论] 验证了所提方法在提升分割精度与降低计算开销方面的有效性,为苹果采摘机器人的实际应用提供了技术基础。

关键词: 深度学习, 苹果采摘, 轻量化, 实例分割, YOLOv11

Abstract:

[Objective] In complex orchard environments, accurate fruit detection and segmentation are critical for autonomous apple-picking robots. Environmental factors severely degrade fruit visibility, challenging instance segmentation models across diverse field conditions. Apple-picking robots operate on embedded edge-computing platforms with stringent constraints on processing power, memory, and energy consumption. Limited computational resources preclude high-complexity deep-learning architectures, requiring segmentation models to balance real-time throughput and resource efficiency. This study introduces SSW-YOLOv11n, a lightweight instance segmentation model derived from YOLOv11n and tailored to orchard environments. SSW-YOLOv11n maintains high mask accuracy under adverse conditions—variable lighting, irregular occlusion, and background clutter—while delivering accelerated inference on resource-limited edge devices through three core design enhancements. [Methods] The SSW-YOLOv11n model first introduced GSConv and VoVGSCSP modules into its neck network, thereby constructing a highly compact yet computationally efficient "Slim-Neck" architecture. By integrating GSConv—an operation that employs grouped spatial convolutions and channel-shuffle techniques—and VoVGSCSP—a cross-stage partial module optimized for balanced depth and width—the model substantially reduced its overall floating-point operations while concurrently enhancing the richness of its feature representations. This optimized neck design facilitated more effective multi-scale information fusion, ensuring that semantic features corresponding to target regions were extracted comprehensively, all without compromising the model's lightweight nature. Subsequently, the authors embedded the SimAM self-attention mechanism at multiple output interfaces between the backbone and neck subnets. SimAM leveraged a parameter-free energy-based weighting strategy to dynamically amplify critical feature responses and suppress irrelevant background activations, thereby augmenting the model's sensitivity to fruit targets amid complex, cluttered orchard scenes. Finally, the original bounding-box regression loss was replaced with Wise-IoU, which incorporated a dynamic weighting scheme based on both center-point distance and geometric discrepancy factors. This modification further refined the regression process, improving localization precision and stability under variable environmental conditions. Collectively, these three innovations synergistically endowed the model with superior instance-segmentation performance and deployment adaptability, offering a transferable design paradigm for implementing deep-learning-based vision systems on resource-constrained agricultural robots. [Results and Discussions] Experimental results demonstrated that SSW-YOLOv11n achieved Box mAP50 and Mask mAP50 of 76.3% and 76.7%, respectively, representing improvements of 1.7 and 2.4 percentage points over the baseline YOLOv11n model. The proposed model reduced computational complexity from 10.4 to 9.1 GFLOPs (12.5% reduction) and achieved a model weight of 4.55 MB compared to 5.89 MB for the baseline (22.8% reduction), demonstrating significant efficiency gains. These results indicate that the synergistic integration of lightweight architecture design and attention mechanisms effectively addresses the trade-off between model complexity and segmentation accuracy. Comparative experiments showed that SSW-YOLOv11n outperformed Mask R-CNN, SOLO, YOLACT, and YOLOv11n with Mask mAP50 improvements of 23.2, 20.3, 21.4, and 2.4 percentage points, respectively, evidencing substantial advantages in segmentation precision within unstructured orchard environments. The superior performance over traditional methods suggests that the proposed approach successfully adapts deep learning architectures to agricultural scenarios with complex environmental conditions. Edge deployment testing on NVIDIA Jetson TX2 platform achieved 29.8 FPS inference rate, representing an 18.7% improvement over YOLOv11n (25.1 FPS), validating the model's real-time performance and suitability for resource-constrained agricultural robotics applications. [Conclusions] SSW-YOLOv11n effectively enhanced fruit-target segmentation accuracy while reducing computational overhead, thus providing a robust technical foundation for the practical application of autonomous apple-picking robots. By addressing the dual imperatives of high-precision perception and efficient inference within constrained hardware contexts, the proposed approach advanced the state of the art in intelligent agricultural robotics and offered a scalable solution for large-scale orchard automation.

Key words: deep learning, apple harvesting, lightweight design, instance segmentation, YOLOv11

中图分类号: