基于轻量化Mamba-YOLO模型的梨表面缺陷检测方法

doi:10.12133/j.smartag.SA202508022

Smart Agriculture ›› 2026, Vol. 8 ›› Issue (2): 147-157.doi: 10.12133/j.smartag.SA202508022

• 信息处理与决策 • 上一篇

基于轻量化Mamba-YOLO模型的梨表面缺陷检测方法

修贤超¹, 费士祺¹^,²^,³, 黄文倩²^,³, 李楠¹(), 苗中华¹

^1. 上海大学机电工程与自动化学院，上海 200444，中国
^2. 北京市农林科学院智能装备技术研究中心，北京 100097，中国
^3. 北京市农林科学院信息技术研究中心，北京 100097，中国

收稿日期:2025-08-21 出版日期:2026-03-30
基金项目:
国家重点研发计划项目(2024YFB4707400); 上海市重点科技攻关项目(24N32800100)
作者简介:
修贤超，博士，副教授，研究方向为人工智能与具身智能。E-mail：xcxiu@shu.edu.cn
通信作者:
李楠，博士，讲师，研究方向为智能装备与机器人技术。E-mail：linan2019@shu.edu.cn

A Lightweight Method for Pear Surface Defect Detection Based on Improved Mamba-YOLO Architecture

XIU Xianchao¹, FEI Shiqi¹^,²^,³, HUANG Wenqian²^,³, LI Nan¹(), MIAO Zhonghua¹

^1. School of Mechanic Engineering and Automation, Shanghai University, Shanghai 200444, China
^2. Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
^3. Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China

Received:2025-08-21 Online:2026-03-30
Foundation items:National Key Research and Development Program of China(2024YFB4707400); Shanghai Key Science and Technology Project(24N32800100)
About author:
XIU Xianchao, E-mail:xcxiu@shu.edu.cn
Corresponding author:
LI Nan, E-mail: linan2019@shu.edu.cn

摘要/Abstract

摘要：

【目的/意义】 针对当前砀山梨表面缺陷因尺度小而导致检测精度差的问题，本研究提出了一种基于改进Mamba-YOLO的轻量化高精度模型，旨在实现检测精度与效率的平衡。 【方法】 首先，采用动态上采样模块，相较于现有Mamba-YOLO的上采样模块具有更少的参数量和浮点运算次数，可在保障模型计算效率的同时，提升对缺陷细节信息的保留能力。其次，提出频率自适应空洞卷积，通过动态调整卷积核尺寸，使网络依据输入局部特征自适应选择匹配的卷积核，从而增强对缺陷的特征提取能力。最后，融合压缩和激励模块和通道混合器卷积门控线性单元，同时引入多尺寸卷积核提取多尺度特征，进一步提升模型对局部细节的捕捉能力与鲁棒性。 【结果和讨论】 改进后的算法在砀山梨测试集上经过评估，平均精度均值达到了95.1%，帧率达到了72帧/s。与YOLOv8n、Gold-YOLO-N和YOLOv12n相比，平均精度均值分别高出了4.7、5.3和6.3个百分点；与基准Mamba-YOLO-T相比，平均精度均值提升了3.4个百分点，帧率提高了10.8个百分点。 【结论】 改进模型在提升综合检测性能的同时降低了计算复杂度与参数量，可为轻量化梨表面缺陷检测研究提供可靠的算法支撑。

关键词: Mamba-YOLO, 缺陷检测, 图像识别, 频率自适应空洞卷积, 卷积核, 动态上采样模块

Abstract:

[Objective] Pears are a common fruit rich in vitamins and minerals. Traditional pear grading primarily relies on manual inspection, which is not only laborious but also susceptible to subjective factors, leading to unstable and inaccurate results. Furthermore, manual operations may cause varying degrees of physical damage to pears, affecting their appearance and market value. Therefore, developing an automated, efficient, and reliable pear grading technology has become an urgent demand in the industry. To address the current problem of poor detection accuracy caused by the small scale of surface defects in Dangshan pears, a lightweight high-precision model was proposed based on an improved Mamba-YOLO architecture, aiming to balance detection accuracy and efficiency. [Methods] The dataset comprised 1 000 images, which were partitioned into training, validation, and test sets in an 8:1:1 ratio. The following improvements were made to the network architecture. Firstly, a dynamic upsampling (Dysample) module was adopted. Compared to the existing upsampling module in Mamba-YOLO, the Dysample module featured fewer parameters and floating-point operations (FLOPs). Its design eliminated complex dynamic convolution kernels, requiring only a small number of linear layers and grouping operations, thereby preserving computational efficiency while enhancing the retention of defect details. Secondly, regarding pear surface defect detection, defects often exhibited high-frequency local features, whereas traditional convolutional neural networks (CNNs) suffer from insufficient feature capture and imbalanced frequency response. As the dilation rate increased, the frequency response of the convolution kernel decreased and its bandwidth narrowed, consequently limiting its ability to process high-frequency information. Therefore, a frequency-adaptive dilated convolution (FADC) module was proposed, which dynamically adjusted the convolution kernel size, enabling the network to adaptively select matching kernels based on local input features. Smaller kernels were used in high-frequency regions, and larger kernels in low-frequency regions, thereby achieving collaborative optimization of multi-band features and enhancing the ability to extract defect features. Finally, considering that using only single-scale depthwise convolutions to capture local features might lead to insufficient perception of input feature information, and that traditional gating mechanisms may lack adequate global context information modeling, the squeeze-and-excitation module was fused with a channel mixer based on the convolutional gated linear unit (CGLU). This combination was extended into a multi-scale version termed MS-CGLU. By incorporating convolutional kernels of different sizes to extract multi-scale features, followed by weighted fusion, stronger feature representation was achieved. [Results and Discussions] The proposed method was rigorously evaluated on the dangshan pear test set. Ablation experiments demonstrated that introducing the CGLU, FADC, and Dysample enhanced detection performance, confirming the effectiveness of these modules. Compared to YOLOv8n, Gold-YOLO-N, and YOLOv12n, the mean average precision (mAP) was higher by 4.7, 5.3, and 6.3 percentage points, respectively. Compared to the baseline Mamba-YOLO-T, the mAP increased by 3.4 percentage points and the frames per second improved by 10.8 percentage points. Furthermore, in comparative experiments with larger-scale models from the same Mamba-YOLO series, the proposed algorithm still demonstrated significant advantages, i.e., its parameter count was only 41.7% of Mamba-YOLO-B and 15.7% of Mamba-YOLO-L, and its FLOPs was merely 57.1% and 18.1% of the respective models, yet it achieved increases in mAP@0.5 of 3.2% and 1.4%, and increases in mAP@0.5:0.95 of 3.1% and 2.6%, respectively. [Conclusions] This research developed a high-precision and lightweight algorithm for detecting surface defects on Dangshan pears. It achieved a superior balance between detection accuracy and inference speed, significantly outperforming relevant lightweight benchmarks and even larger models within its own family in terms of efficiency. This work can provide reliable algorithmic support for lightweight detection research of pear surface defects.

Key words: Mamba-YOLO, defect detection, image recognition, frequency-adaptive dilated convolution (FADC), convolutional kernels, dynamic upsampling (Dysample)

中图分类号:

TP391

修贤超, 费士祺, 黄文倩, 李楠, 苗中华. 基于轻量化Mamba-YOLO模型的梨表面缺陷检测方法[J]. 智慧农业(中英文), 2026, 8(2): 147-157.

XIU Xianchao, FEI Shiqi, HUANG Wenqian, LI Nan, MIAO Zhonghua. A Lightweight Method for Pear Surface Defect Detection Based on Improved Mamba-YOLO Architecture[J]. Smart Agriculture, 2026, 8(2): 147-157.

图/表 11

图1

图2

图3

图4

图5

表1

表2

表3

表4

图6

图7

参考文献 32

[1]	ZHANG H, LAI L S, GU J H, et al. Applications of near-infrared spectroscopy in pear quality assessment: A comprehensive review[J]. Journal of Food Process Engineering, 2025, 48(5): e70086.
[2]	王文辉, 王国平, 田路明, 等. 新中国果树科学研究70年: 梨[J]. 果树学报, 2019, 36(10): 1273-1282.
	WANG W H, WANG G P, TIAN L M, et al. Fruit scientific research in New China in the past 70 years: Pear[J]. Journal of Fruit Science, 2019, 36(10): 1273-1282.
[3]	RAWAT W, WANG Z H. Deep convolutional neural networks for image classification: a comprehensive review[J]. Neural Computation, 2017, 29(9): 2352-2449.
[4]	蒋雪松, 计恺豪, 姜洪喆, 等. 深度学习在林果品质无损检测中的研究进展[J]. 农业工程学报, 2024, 40(17): 1-16.
	JIANG X S, JI K H, JIANG H Z, et al. Research progress of non-destructive detection of forest fruit quality using deep learning[J]. Transactions of the Chinese Society of Agricultural Engineering, 2024, 40(17): 1-16.
[5]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. arXiv: 1409.1556, 2014.
[6]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2016: 770-778.
[7]	BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. arXiv: 2004.10934, 2020.
[8]	LI C Y, LI L L, JIANG H L, et al. YOLOv6: A single-stage object detection framework for industrial applications[EB/OL]. arXiv: 2209.02976, 2022.
[9]	SAPKOTA R, FLORES-CALERO M, QURESHI R, et al. YOLO advances to its genesis: A decadal and comprehensive review of the You Only Look Once (YOLO) series[J]. Artificial Intelligence Review, 2025, 58(9): 274.
[10]	KARKI S, BASAK J K, TAMRAKAR N, et al. Strawberry disease detection using transfer learning of deep convolutional neural networks[J]. Scientia Horticulturae, 2024, 332: 113241.
[11]	周宏平, 金寿祥, 周磊, 等. 基于迁移学习与YOLOv8n的田间油茶果分类识别[J]. 农业工程学报, 2023, 39(20): 159-166.
	ZHOU H P, JIN S X, ZHOU L, et al. Classification and recognition of Camellia oleifera fruit in the field based on transfer learning and YOLOv8n[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39(20): 159-166.
[12]	陈俊霖, 赵鹏, 曹先林, 等. 基于通道剪枝的轻量化YOLOv8s草莓穴盘苗分级检测与定位方法[J]. 智慧农业(中英文), 2024, 6(6): 132-143.
	CHEN J L, ZHAO P, CAO X L, et al. Lightweight YOLOv8s-based strawberry plug seedling grading detection and localization via channel pruning[J]. Smart Agriculture, 2024, 6(6): 132-143.
[13]	黎祖胜, 唐吉深, 匡迎春. 基于改进YOLOv10n的轻量化荔枝虫害小目标检测模型[J]. 智慧农业(中英文), 2025, 7(2): 146-159.
	LI Z S, TANG J S, KUANG Y C. A lightweight model for detecting small targets of Litchi pests based on improved YOLOv10n[J]. Smart Agriculture, 2025, 7(2): 146-159.
[14]	杨启良, 禹璐, 梁嘉平. 基于改进YOLOv11的采后芦笋分级检测方法[J]. 智慧农业(中英文), 2025, 7(4): 84-94.
	YANG Q L, YU L, LIANG J P. Grading Asparagus officinalis L. using improved YOLOv11[J]. Smart Agriculture, 2025, 7(4): 84-94.
[15]	LI L T, ZHAO Y D. Tea disease identification based on ECA attention mechanism ResNet50 network[J]. Frontiers in Plant Science, 2025, 16: 1489655.
[16]	HU W X, XIONG J T, LIANG J H, et al. A method of Citrus epidermis defects detection based on an improved YOLOv5[J]. Biosystems Engineering, 2023, 227: 19-35.
[17]	INBAR O, SHAHAR M, GIDRON J, et al. Analyzing the secondary wastewater-treatment process using Faster R-CNN and YOLOv5 object detection algorithms[J]. Journal of Cleaner Production, 2023, 416: 137913.
[18]	谭厚森, 马文宏, 田原, 等. 基于改进YOLOv8n的香梨目标检测方法[J]. 农业工程学报, 2024, 40(11): 178-185.
	TAN H S, MA W H, TIAN Y, et al. Improved YOLOv8n object detection of fragrant pears[J]. Transactions of the Chinese Society of Agricultural Engineering, 2024, 40(11): 178-185.
[19]	CHEN J R, KAO S H, HE H, et al. Run, don't walk: Chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 12021-12031.
[20]	TERVEN J, CÓRDOVA-ESPARZA D M, ROMERO-GONZÁLEZ J A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS[J]. Machine Learning and Knowledge Extraction, 2023, 5(4): 1680-1716.
[21]	ZHANG H, XU C, ZHANG S J. Inner-IoU: More effective intersection over union loss with auxiliary bounding box[EB/OL]. arXiv: 2311.02877, 2023.
[22]	MENGHANI G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better[J]. ACM Computing Surveys, 2023, 55(12): 1-37.
[23]	ZHU L H, LIAO B C, ZHANG Q, et al. Vision mamba: Efficient visual representation learning with bidirectional state space model[EB/OL]. arXiv: 2401.09417, 2024.
[24]	WANG Z Y, LI C, XU H Y, et al. Mamba YOLO: A simple baseline for object detection with state space model[EB/OL]. arXiv: 2406.05835, 2024.
[25]	SHI D. TransNeXt: Robust foveal visual perception for vision transformers[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2024: 17773-17783.
[26]	LIU W Z, LU H, FU H T, et al. Learning to upsample by learning to sample[C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2023: 6004-6014.
[27]	CHEN L W, GU L, ZHENG D Z, et al. Frequency-adaptive dilated convolution for semantic segmentation[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2024: 3414-3425.
[28]	GU A, GOEL K, RÉ C. Efficiently modeling long sequences with structured state spaces[EB/OL]. arXiv: 2111.00396, 2021.
[29]	LIU Y, TIAN Y J, ZHAO Y Z, et al. VMamba: Visual state space model[EB/OL]. arXiv: 2401.10166, 2024.
[30]	WANG J Q, CHEN K, XU R, et al. CARAFE: Content-aware ReAssembly of FEatures[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2019: 3007-3016.
[31]	LU H, LIU W Z, FU H T, et al. FADE: Fusing theAssets ofDecoder andEncoder forTask-Agnostic Upsampling[C]// Computer Vision-ECCV 2022. Cham, Germany: Springer, 2022: 231-247.
[32]	LU H, LIU W Z, YE Z X, et al. SAPA: Similarity-aware point affiliation for feature upsampling[EB/OL]. arXiv: 2209.12866, 2022.

模型	FADC	CGLU	Dysample	精确率/%	召回率/%	mAP_0.5/%	mAP_0.5：0.95/%	F ₁值/%	FPS/（帧/s）
B	×	×	×	95.7	83.9	91.7	53.2	89.4	65
B+Dysample	×	×	√	92.6	92.9	92.9	53.5	92.7	100
B+FADC	√	×	×	93.1	90.4	93.1	54.7	91.7	56
B+CGLU	×	√	×	97.1	88.4	92.2	53.7	92.5	38
B+CGLU+Dysample	×	√	√	95.5	87.4	93.8	53.9	91.3	51
B+FADC+Dysample	√	×	√	88.4	92.0	92.3	54.0	90.2	57
B+FADC+CGLU	√	√	×	93.7	89.2	93.0	55.2	91.4	41
B+FADC+CGLU+Dysample	√	√	√	95.1	91.1	95.1	56.6	93.1	72

模型	mAP_0.5/%	mAP_0.5：0.95/%	参数量/M	计算量/GFLOPs
YOLOv5n	86.8	49.1	1.9	4.5
YOLOv6n	90.3	48.4	4.7	4.7
YOLOv7-tiny	90.7	49.3	6.2	13.7
YOLOv8n	90.4	50.5	3.2	34.1
Gold-YOLO-N	89.8	51.0	5.6	12.1
YOLOv12n	88.8	51.9	2.6	6.5
Mamba-YOLO-FC	95.1	56.6	9.1	28.4

模型	mAP_0.5/%	mAP_0.5：0.95/%	参数量/M	计算量/GFLOPs
Mamba-YOLO-T	91.7	53.2	6.1	14.3
Mamba-YOLO-B	91.9	53.5	21.8	49.7
Mamba-YOLO-L	94.2	54.0	57.6	156.2
Mamba-YOLO-FC	95.1	56.6	9.1	28.4

特征提取模块	精确率/%	召回率/%	mAP_0.5/%	mAP_0.5：0.95/%	F ₁值/%	计算量/GFLOPs
C2f	90.9	93.6	93.4	54.5	92.2	33.1
C3	94.0	85.8	92.8	54.3	89.7	23.6
ODSS	95.7	83.9	91.7	53.2	89.4	14.3
FCSS	95.1	91.1	95.1	55.6	93.1	28.4

基于轻量化Mamba-YOLO模型的梨表面缺陷检测方法

A Lightweight Method for Pear Surface Defect Detection Based on Improved Mamba-YOLO Architecture

在线阅读

知网下载

本地下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 32

相关文章 3

编辑推荐

Metrics

本文评价

[1]	戴维娇, 梁禹东辰, 周勇, 姚超, 章程, 宋永健, 李国亮, 田芳. 羊只体尺测量的研究进展：从二维视觉到三维重建及2D-3D融合[J]. 智慧农业(中英文), 2026, 8(1): 120-147.
[2]	杨启良, 禹璐, 梁嘉平. 基于改进YOLOv11的采后芦笋分级检测方法[J]. 智慧农业(中英文), 2025, 7(4): 84-94.
[3]	马巍巍, 陈悦, 王咏梅. 基于深度网络集成的复杂背景甘蔗叶片病害识别[J]. 智慧农业(中英文), 2025, 7(1): 136-145.