欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于轻量化Mamba-YOLO模型的梨表面缺陷检测方法

修贤超1, 费士祺1,2,3, 黄文倩2,3, 李楠1(), 苗中华1   

  1. 1. 上海大学 机电工程与自动化学院,上海 200444,中国
    2. 北京市农林科学院智能装备技术研究中心,北京 100097,中国
    3. 北京市农林科学院信息技术研究中心,北京 100097,中国
  • 收稿日期:2025-08-21 出版日期:2025-12-11
  • 基金项目:
    国家重点研发计划项目(2024YFB4707400); 上海市重点科技攻关项目(24N32800100)
  • 作者简介:

    修贤超,博士,副教授,研究方向为人工智能与具身智能。E-mail:

  • 通信作者:
    李 楠,博士,讲师,研究方向为智能装备与机器人技术。E-mail:

Lightweight Mamba-YOLO Based Approach for Pear Surface Defect Detection

XIU Xianchao1, FEI Shiqi1,2,3, HUANG Wenqian2,3, LI Nan1(), MIAO Zhonghua1   

  1. 1. School of Mechanic Engineering and Automation, Shanghai University, Shanghai 200444, China
    2. Intelligent Equipment Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
    3. Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
  • Received:2025-08-21 Online:2025-12-11
  • Foundation items:National Key Research and Development Program of China(2024YFB4707400); Shanghai Key Science and Technology Project(24N32800100)
  • About author:

    XIU Xianchao, E-mail:

  • Corresponding author:
    LI Nan, E-mail:

摘要:

【目的/意义】 针对当前砀山梨表面缺陷因尺度小而导致检测精度差的问题,本研究提出了一种基于改进Mamba-YOLO的轻量化高精度模型,旨在实现检测精度与效率的平衡。 【方法】 首先,采用动态上采样(Dynamic Upsampling, Dysample)模块,相较于现有Mamba-YOLO的上采样模块具有更少的参数量和浮点运算次数,可在保障模型计算效率的同时,提升对缺陷细节信息的保留能力。其次,提出频率自适应空洞卷积(Frequency-Adaptive Dilated Convolution, FADC),通过动态调整卷积核尺寸,使网络依据输入局部特征自适应选择匹配的卷积核,从而增强对缺陷的特征提取能力。最后,融合压缩和激励(Squeeze-and-Excitation, SE)模块和通道混合器卷积门控线性单元(Convolutional Gated Linear Unit, CGLU),同时引入多尺寸卷积核提取多尺度特征,进一步提升模型对局部细节的捕捉能力与鲁棒性。 【结果和讨论】 改进后的算法在砀山梨测试集上经过评估,平均精度均值达到了95.1%,检测速度达到了72帧/s。与YOLOv8n、Gold-YOLO-N和YOLOv12n相比,平均精度均值分别高出了4.7、5.3和6.3个百分点与基准Mamba-YOLO-T相比,平均精度均值提升了3.4个百分点,帧速率(Frames Per Second, FPS)提高了10.8个百分点。 【结论】 改进模型在提升综合检测性能的同时降低了计算复杂度与参数量,可为轻量化梨表面缺陷检测研究提供可靠的算法支撑。

关键词: 砀山梨, YOLO轻量化, 缺陷检测, 图像识别

Abstract:

[Objective] Pears are a common fruit rich in vitamins and minerals. Traditional pear grading primarily relies on manual inspection, which is not only laborious but also susceptible to subjective factors, leading to unstable and inaccurate results. Furthermore, manual operations may cause varying degrees of physical damage to pears, affecting their appearance and market value. Therefore, developing an automated, efficient, and reliable pear grading technology has become an urgent demand in the industry. To address the current problem of poor detection accuracy caused by the small scale of surface defects in dangshan pears, a lightweight high-precision model is proposed based on an improved Mamba-YOLO architecture, aiming to balance detection accuracy and efficiency. [Methods] To enhance model training precision and generalization capability, images with poor pixel quality or blurriness were manually removed. The final dataset comprised 1 000 images, which were partitioned into training, validation, and test sets in an 8:1:1 ratio. Additionally, data augmentation techniques, including rotation, cropping, mirroring, and brightness adjustment, were applied to the dataset to improve training effectiveness. The following improvements were made to the network architecture. Firstly, a dynamic upsampling (Dysample) module was adopted. Compared to the existing upsampling module in Mamba-YOLO, the Dysample module featured fewer parameters and floating-point operations (FLOPs). Its design eliminated complex dynamic convolution kernels, requiring only a small number of linear layers and grouping operations, thereby preserving computational efficiency while enhancing the retention of defect details. Secondly, regarding pear surface defect detection, defects often exhibited high-frequency local features, whereas traditional convolutional neural networks (CNNs) suffered from insufficient feature capture and imbalanced frequency response. As the dilation rate increased, the frequency response of the convolution kernel decreased and its bandwidth narrowed, consequently limiting its ability to process high-frequency information. Therefore, a frequency-adaptive dilated convolution (FADC) module was proposed, which dynamically adjusted the convolution kernel size, enabling the network to adaptively select matching kernels based on local input features. Smaller kernels were used in high-frequency regions, and larger kernels in low-frequency regions, thereby achieving collaborative optimization of multi-band features and enhancing the ability to extract defect features. Finally, considering that using only single-scale depthwise convolutions to capture local features might lead to insufficient perception of input feature information, and that traditional gating mechanisms may lack adequate global context information modeling, the squeeze-and-excitation (SE) module was fused with a channel mixer based on the convolutional gated linear unit (CGLU). This combination was extended into a multi-scale version termed MS-CGLU. By incorporating convolutional kernels of different sizes to extract multi-scale features, followed by weighted fusion, stronger feature representation was achieved. [Results and Discussions] The proposed algorithm was rigorously evaluated on the dangshan pear test set. Ablation experiments demonstrated that introducing the CGLU, FADC, and Dysample enhanced detection performance, confirming the effectiveness of these modules. Compared to YOLOv8n, Gold-YOLO-N, and YOLOv12n, the mean average precision (mAP) was higher by 4.7, 5.3, and 6.3 percent points, respectively. Compared to the baseline Mamba-YOLO-T, the mAP increased by 3.4 percent points and the frames per second (FPS) improved by 10.8 percent points. Furthermore, in comparative experiments with larger-scale models from the same Mamba-YOLO series, the proposed algorithm still demonstrated significant advantages, i.e., its parameter count was only 41.7% of Mamba-YOLO-B and 15.7% of Mamba-YOLO-L, and its FLOPs was merely 57.1% and 18.1% of the respective models, yet it achieved increases in mAP@0.5 of 3.2% and 1.4%, and increases in mAP@0.5:0.95 of 3.1% and 2.6%, respectively. [Conclusions] This study successfully developed a high-precision and lightweight algorithm for detecting surface defects on dangshan pears. It achieved a superior balance between detection accuracy and inference speed, significantly outperforming relevant lightweight benchmarks and even larger models within its own family in terms of efficiency. This work can provide reliable algorithmic support for lightweight detection research of pear surface defects.

Key words: dangshan pear, lightweight YOLO, defect detection, image recognition

中图分类号: