欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2026, Vol. 8 ›› Issue (1): 1-14.doi: 10.12133/j.smartag.SA202506005

• 专题--农业病虫害智能识别与诊断 •    下一篇

基于改进YOLOv12s的辣椒叶片病虫害轻量化检测方法

姚晓通(), 曲绍业   

  1. 兰州交通大学 电子与信息工程学院,甘肃 兰州 730070,中国
  • 收稿日期:2025-06-03 出版日期:2026-01-30
  • 基金项目:
    国家自然科学基金项目(51567014); 甘肃省科技计划项目(22JR5RA797)
  • 通信作者:
    姚晓通,博士,副教授,研究方向为物联网与智能检测、机器人与视觉控制、大数据与人工智能等。E-mail:

Lightweight Detection Method for Pepper Leaf Diseases and Pests Based on Improved YOLOv12s

YAO Xiaotong(), QU Shaoye   

  1. School of Electronics and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
  • Received:2025-06-03 Online:2026-01-30
  • Foundation items:National Natural Science Foundation of China Project(51567014); Gansu Provincial Science and Technology Program(22JR5RA797)
  • Corresponding author:
    YAO Xiaotong, E-mail:

摘要:

[目的/意义] 针对自然环境干扰下检测模型对辣椒叶片病虫害的特征提取不充分、容易忽视目标物体的边缘信息,以及小块病斑与虫害病灶易漏检等问题,本研究提出一种轻量化辣椒叶片病害检测算法,即YOLO-MDFR(You Only Look Once Version 12-MDFR)。 [方法] 基于YOLOv12s模型做出改进。首先用两个堆叠的3×3的深度可分离卷积代替一个5×5的深度可分离卷积以改进MobileNetV4,并将其代替YOLOv12s的原始骨干网络实现骨干网络轻量化。其次为提高小目标物体的特征提取能力,提出了多维频域互补自注意力机制模块(Dimensional Frequency Reciprocal Attention Mixing Transformer, D-F-Ramit)。最后利用D-F-Ramit与RAGConv(Residual Aggregation Gate-Controlled Convolution)重新设计颈部网络,增强模型的特征融合能力和信息传递能力。基于以上改进提出YOLO-MDFR目标检测算法。 [结果和讨论] 实验结果表明,本研究提出的YOLO-MDFR模型在实验数据集上的平均识别精确度达到95.6%,与YOLOv12s模型相比,平均识别精度提高了2.0%,同时参数量下降了61.5%,计算量下降了68.5%,帧率达到43.4帧/s。 [结论] 本研究通过系统性的架构优化,在保持模型轻量化的同时显著提升了检测性能,实现了计算效率与检测精度的最佳平衡。

关键词: YOLO, 叶片病虫害检测, MobileNetV4, 轻量化模型, 注意力机制

Abstract:

[Objective] Pepper cultivation frequently faces challenges from diseases and pests, and early detection is critical for reducing yield losses. However, existing detection models often suffer from limitations such as insufficient feature extraction for subtle lesions, loss of edge information due to complex backgrounds, and high missed detection rates for small lesions. To address these issues, the YOLO-MDFR (You Only Look Once), a lightweight detection algorithm was proposed based on an enhanced YOLOv12s, specifically designed for accurate identification of pepper leaf diseases and pests in complex natural environments. [Methods] The dataset was established in the primary pepper cultivation zone of Gangu county, Tianshui city, Gansu province. The cultivated variety was the locally dominant Capsicum annuum L. var. conoides (Mill.). Data collection was conducted from March 15 to May 20, 2024. The collected samples included four categories of pepper leaves: healthy leaves, leaves damaged by thrips, leaves infected with tobacco mosaic virus exhibiting yellowing symptoms, and leaves affected by bacterial leaf spot. First, the original YOLOv12s backbone was replaced with an improved MobileNetV4 architecture to enhance lightweight performance while preserving feature extraction capability. Specifically, the original 5×5 standard convolutions in the bottleneck layers of MobileNetV4 were substituted with two sequential 3×3 depthwise separable convolutions. This design was based on the principle that two 3×3 convolutions achieve an equivalent receptive field (matching the 5×5 coverage) while reducing parameter count—depthwise separable convolutions further decompose spatial and channel convolution, minimizing redundant computations. Second, a novel dimensional frequency reciprocal attention mixing transformer (D-F-Ramit) module was introduced to enhance sensitivity to lesion boundaries and fine-grained textures. The module first converted feature maps from the spatial domain to the frequency domain using discrete cosine transform (DCT), capturing high-frequency components often lost in spatial-only attention. It then integrated three parallel branches: channel attention, spatial attention, and frequency-domain attention. Finally, a residual aggregation gate-controlled convolution (RAGConv) module was developed for the neck network. This module included a residual aggregation path to collect multi-layer feature information and a gate control unit that dynamically weighted feature components based on their relevance. The residual structure provided a direct gradient propagation path, alleviating gradient vanishing during backpropagation and ensuring efficient information transfer during feature fusion. A systematic experimental framework was established to comprehensively evaluate model performance: (1) Ablation studies were conducted using a controlled variable approach to verify the individual contributions of the improved MobileNetV4, D-F-Ramit, and RAGConv modules; (2) Lesion scale sensitivity analysis assessed detection performance across different lesion sizes, with emphasis on small-spot recognition; (3) Resolution impact analysis evaluated five common input resolutions (320×320–736×736) to explore the trade-offs among accuracy, speed, and computational efficiency; and (4) Embedded deployment validation involved model quantization and implementation on the Rockchip RK3588 platform to measure inference speed and power consumption on edge devices. [Results and Discussions] The proposed YOLO-MDFR achieved an mAP@0.5 of 95.6% on this dataset. Compared to YOLOv12s, it improved accuracy by 2.0%, reduced parameters by 61.5%, and lowered computational complexity by 68.5%. Real-time testing showed 43.4 f/s on an NVIDIA RTX 4060 GPU (CUDA 12.2) and 22.8 f/s on a Rockchip RK3588 embedded platform with only 3.5 W power consumption—suitable for battery-powered field devices. Lesion-scale analysis revealed 33.5% accuracy for <16×16 pixel lesions critical for early detection. Confusion matrix evaluation reduced misclassification, bacterial leaf spot/thrips damage misrates fell from 5.8% to 2.1%, and tobacco mosaic virus/healthy leaves from 3.2% to 1.5%, resulting in an overall 2.3% misrate. Experiments across varying input resolutions revealed a clear performance–resolution trade-off. As resolution increased from 320×320 to 736×736, mAP rose from 89.5% to 96.2%, showing diminishing returns beyond 512×512. Concurrently, computational cost grew roughly quadratically, reducing inference speed from 65.2 f/s to 35.1 f/s. [Conclusions] This study presents YOLO-MDFR, a lightweight detection model for identifying pepper leaf diseases and pests under complex natural conditions. By integrating an improved MobileNetV4 backbone, a multi-dimensional frequency reciprocal attention mixing transformer (D-F-Ramit), and a residual aggregation gate-controlled convolution (RAGConv) module, YOLO-MDFR outperforms mainstream detection models in both accuracy and efficiency. Systematic deployment experiments yielded optimized configurations for different application scenarios. Despite its strong performance, the model shows limitations in robustness under extreme lighting, generalization to emerging diseases, and detection of small targets under occlusion. Future work will address these issues through ambient light data fusion, domain adaptation with semi-supervised learning, and binocular vision integration.

Key words: YOLO, leaf disease and pest detection, MobileNetV4, lightweight deep learning model, attention mechanism

中图分类号: