Research on a Lightweight Apple Instance Segmentation Algorithm Based on SSW-YOLOv11n for Complex Orchard Environments

doi:10.12133/j.smartag.SA202505002

Abstract

Abstract:

[Objective] In complex orchard environments, accurate fruit detection and segmentation is critical for autonomous apple-picking robots. Environmental factors severely degrade fruit visibility, challenging instance segmentation models across diverse field conditions. Apple-picking robots operate on embedded edge-computing platforms with stringent constraints on processing power, memory, and energy consumption. Limited computational resources preclude high-complexity deep-learning architectures, requiring segmentation models to balance real-time throughput and resource efficiency. This study introduces SSW-YOLOv11n, a lightweight instance segmentation model derived from YOLOv11n and tailored to orchard environments. SSW-YOLOv11n maintains high mask accuracy under adverse conditions—variable lighting, irregular occlusion, and background clutter—while delivering accelerated inference on resource-limited edge devices through three core design enhancements. [Methods] The SSW-YOLOv11n model first introduced GSConv and VoVGSCSP modules into its neck network, thereby constructing a highly compact yet computationally efficient "Slim-Neck" architecture. By integrating GSConv—an operation that employs grouped spatial convolutions and channel-shuffle techniques—and VoVGSCSP—a cross-stage partial module optimized for balanced depth and width—the model substantially reduced its overall floating-point operations while concurrently enhancing the richness of its feature representations. This optimized neck design facilitated more effective multi-scale information fusion, ensuring that semantic features corresponding to target regions were extracted comprehensively, all without compromising the model's lightweight nature. Subsequently, the authors embedded the SimAM self-attention mechanism at multiple output interfaces between the backbone and neck subnets. SimAM leveraged a parameter-free energy-based weighting strategy to dynamically amplify critical feature responses and suppress irrelevant background activations, thereby augmenting the model's sensitivity to fruit targets amid complex, cluttered orchard scenes. Finally, the original bounding-box regression loss was replaced with Wise-IoU, which incorporated a dynamic weighting scheme based on both center-point distance and geometric discrepancy factors. This modification further refined the regression process, improving localization precision and stability under variable environmental conditions. Collectively, these three innovations synergistically endowed the model with superior instance-segmentation performance and deployment adaptability, offering a transferable design paradigm for implementing deep-learning-based vision systems on resource-constrained agricultural robots. [Results and Discussions] Experimental results demonstrated that SSW-YOLOv11n achieved Box mAP50 and Mask mAP50 of 76.3% and 76.7%, respectively, representing improvements of 1.7 and 2.4 percent point over the baseline YOLOv11n model. The proposed model reduced computational complexity from 10.4 GFLOPs to 9.1 GFLOPs (12.5% reduction) and achieved a model weight of 4.55 MB compared to 5.89 MB for the baseline (22.8% reduction), demonstrating significant efficiency gains. These results indicate that the synergistic integration of lightweight architecture design and attention mechanisms effectively addresses the trade-off between model complexity and segmentation accuracy. Comparative experiments showed that SSW-YOLOv11n outperformed Mask R-CNN, SOLO, YOLACT, and YOLOv11n with Mask mAP50 improvements of 23.2, 20.3, 21.4, and 2.4 percent point, respectively, evidencing substantial advantages in segmentation precision within unstructured orchard environments. The superior performance over traditional methods suggests that the proposed approach successfully adapts deep learning architectures to agricultural scenarios with complex environmental conditions. Edge deployment testing on NVIDIA Jetson TX2 platform achieved 29.8 FPS inference rate, representing an 18.7% improvement over YOLOv11n (25.1 FPS), validating the model's real-time performance and suitability for resource-constrained agricultural robotics applications. [Conclusions] SSW-YOLOv11n effectively enhanced fruit-target segmentation accuracy while reducing computational overhead, thus providing a robust technical foundation for the practical application of autonomous apple-picking robots. By addressing the dual imperatives of high-precision perception and efficient inference within constrained hardware contexts, the proposed approach advanced the state of the art in intelligent agricultural robotics and offered a scalable solution for large-scale orchard automation.

Key words: deep learning, apple harvesting, lightweight design, instance segmentation, YOLOv11

CLC Number:

TP391.4

HAN Wenkai, LI Tao, FENG Qingchun, CHEN Liping. Research on a Lightweight Apple Instance Segmentation Algorithm Based on SSW-YOLOv11n for Complex Orchard Environments[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202505002.

Figures/Tables 11

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 1

Table 2

Table 3

Table 4

Fig. 7

References 30

[1]	陈青, 殷程凯, 郭自良, 等 . 苹果采摘机器人关键技术研究现状与发展趋势[J]. 农业工程学报, 2023, 39(4): 1-15.
	CHEN Q , YIN C K , GUO Z L , et al . Current status and future development of the key technologies for apple picking robots[J]. Transactions of the Chinese society of agricultural engineering, 2023, 39(4): 1-15.
[2]	LI T , XIE F , ZHAO Z Q , et al . A multi-arm robot system for efficient apple harvesting: Perception, task plan and control[J]. Computers and electronics in agriculture, 2023, 211: ID 107979.
[3]	XIE F , LI T , FENG Q C , et al . Boosting cost-efficiency in robotics: A distributed computing approach for harvesting robots[J]. Journal of field robotics, 2025, 42(5): 1633-1648.
[4]	SAFARI Y , NAKATUMBA-NABENDE J , NAKASI R , et al . A review on automated detection and assessment of fruit damage using machine learning[J]. IEEE access, 2024, 12: 21358-21381.
[5]	ZHANG K X , LAMMERS K , CHU P Y , et al . An automated apple harvesting robot: From system design to field evaluation[J]. Journal of field robotics, 2024, 41(7): 2384-2400.
[6]	HUA W J , ZHANG Z , ZHANG W Q , et al . Key technologies in apple harvesting robot for standardized orchards: A comprehensive review of innovations, challenges, and future directions[J]. Computers and electronics in agriculture, 2025, 235: ID 110343.
[7]	MO Y J , WU Y , YANG X N , et al . Review the state-of-the-art technologies of semantic segmentation based on deep learning[J]. Neurocomputing, 2022, 493: 626-646.
[8]	HE K M , GKIOXARI G , DOLLAR P , et al . Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 2980-2988.
[9]	WANG D D , HE D J . Fusion of Mask RCNN and attention mechanism for instance segmentation of apples under complex background[J]. Computers and electronics in agriculture, 2022, 196: ID 106864.
[10]	REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2016: 779-788.
[11]	LI T , FENG Q C , QIU Q , et al . Occluded apple fruit detection and localization with a frustum-based point-cloud-processing approach for robotic harvesting[J]. Remote sensing, 2022, 14(3): ID 482.
[12]	LI X T , DING H H , YUAN H B , et al . Transformer-based visual segmentation: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2024, 46(12): 10138-10163.
[13]	Vaswani A , Shazeer N , Parmar N , et al . Attention is all you need[C]// Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates, Inc., 2017: 5998-6008.
[14]	SRINIVAS A , LIN T Y , PARMAR N , et al . Bottleneck transformers for visual recognition[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2021: 16514-16524.
[15]	RAFFEL C , ELLIS D P W . Feed-forward networks with attention can solve some long-term memory problems[EB/OL]. arXiv:1512.08756, 2015.
[16]	贾伟宽, 孟虎, 马晓慧, 等 . 基于优化Transformer网络的绿色目标果实高效检测模型[J]. 农业工程学报, 2021, 37(14): 163-170.
	JIA W K , MENG H , MA X H , et al . Efficient detection model of green target fruit based on optimized Transformer network[J]. Transactions of the Chinese society of agricultural engineering, 2021, 37(14): 163-170.
[17]	KOONCE B . MobileNetV3[M]//Convolutional Neural Networks with Swift for Tensorflow. Berkeley, CA: Apress, 2021: 125-144.
[18]	ZHANG X Y , ZHOU X Y , LIN M X , et al . ShuffleNet: An extremely efficient convolutional neural network for mobile devices[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 6848-6856.
[19]	胡广锐, 周建国, 陈超, 等 . 融合轻量化网络与注意力机制的果园环境下苹果检测方法[J]. 农业工程学报, 2022, 38(19): 131-142.
	HU G R , ZHOU J G , CHEN C , et al . Fusion of the lightweight network and visual attention mechanism to detect apples in orchard environment[J]. Transactions of the Chinese society of agricultural engineering, 2022, 38(19): 131-142.
[20]	罗友璐, 潘勇浩, 夏顺兴, 等 . 基于改进YOLOv8的苹果叶病害轻量化检测算法[J]. 智慧农业(中英文), 2024, 6(5): 128-138.
	LUO Y L , PAN Y H , XIA S X , et al . Lightweight apple leaf disease detection algorithm based on improved YOLOv8[J]. Smart agriculture, 2024, 6(5): 128-138.
[21]	NIU W J , CHEN Y X , HE B G , et al . Intelligent veins recognition method for slope rock mass geological images in complex background noise[J]. Computers & geosciences, 2025, 197: ID 105885.
[22]	REDMON J , FARHADI A . YOLOv3: An incremental improvement[EB/OL]. arXiv:1804.02767, 2018.
[23]	ZHANG Z Y , YANG Y F , XU X , et al . GVC-YOLO: A lightweight real-time detection method for cotton aphid-damaged leaves based on edge computing[J]. Remote sensing, 2024, 16(16): ID 3046.
[24]	LI H , LI J , WEI H , et al . Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles[EB/OL]. arXiv: 2206.02424, 2022.
[25]	YANG L X , ZHANG R Y , LI L D , et al . SimAM: A simple, parameter-free attention module for convolutional neural networks [C]// Proceedings of the 38th International Conference on Machine Learning. New York, USA: PMLR, 2021: 11863-11874.
[26]	TONG Z , CHEN Y , XU Z , et al . Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. arXiv: 2301.10051, 2023.
[27]	HAN B , LU Z A , DONG L , et al . Lightweight non-destructive detection of diseased apples based on structural re-parameterization technique[J]. Applied sciences, 2024, 14(5): ID 1907.
[28]	BOLYA D , ZHOU C , XIAO F Y , et al . YOLACT: Real-time instance segmentation[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2019: 9156-9165.
[29]	WANG X L , ZHANG R F , SHEN C H , et al . SOLO: A simple framework for instance segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(11): 8587-8601.
[30]	KHANAM R , HUSSAIN M . YOLOv11: An overview of the key architectural enhancements[EB/OL]. arXiv: 2410.17725, 2024.

	配置
中央处理器	Intel Xeon（R） Silver 4210
图形处理器	NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2
开发环境	Python 3.8
深度学习框架	CUDA10.2+CUDNN8.2.0+Pytorch 1.12.0+Torchvision 0.13
操作系统	Ubuntu 18.04

序号	Slim-Neck	SimAM	Wise-IoU	Box mAP50/%	Mask P/%	Mask R/%	Mask mAP50/%	GFLOPS	权重大小/MB	FPS
1				74.6	72.4	72.9	74.3	10.4	5.89	25.1
2	√			75.1	72.5	73.4	75.1	10.0	5.72	26.5
3		√		75.6	73.1	73.7	75.3	9.8	5.60	26.8
4			√	74.7	71.3	71.2	74.8	10.1	5.64	25.5
5	√	√		76.1	72.7	72.3	76.2	9.3	5.28	28.4
6	√		√	75.5	72.2	72.5	75.5	9.7	5.05	27.9
7		√	√	75.7	72.8	72.6	75.6	9.5	4.85	27.6
8	√	√	√	76.3	73.5	73.8	76.7	9.1	4.55	29.8

模型	Box mAP50/%	Mask P/%	Mask R/%	Mask mAP50/%	GFLOPS	权重大小/MB	FPS
Mask R-CNN	43.2	42.5	54.1	53.5	245	205	24.5
SOLO	47.2	57.2	55.3	56.4	132	176	24.6
YOLACT	44.8	57.9	42.4	55.3	79.6	143	24.8
YOLOv11n	74.6	72.4	72.9	74.3	10.4	5.89	25.1
SSW-YOLOv11n	76.3	73.5	73.8	76.7	9.1	4.55	29.8