欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

多尺度异构特征协同的棉花叶片病害检测模型

沈学利, 张越(), 金海波, 张旭旭   

  1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105,中国
  • 收稿日期:2026-01-22 出版日期:2026-03-13
  • 基金项目:
    国家自然科学基金(62173171)
  • 作者简介:

    沈学利,博士,教授,研究方向为图像与视觉信息计算技术。E-mail:

  • 通信作者:
    张 越,硕士研究生,研究方向为图像与视觉信息计算技术。E-mail:

Multi-Scale Heterogeneous Feature Synergistic Model for Cotton Leaf Disease Detection

SHEN Xueli, ZHANG Yue(), JIN Haibo, ZHANG Xuxu   

  1. School of Software, Liaoning Technical University, Huludao 125105, China
  • Received:2026-01-22 Online:2026-03-13
  • Foundation items:National Natural Science Foundation of China(62173171)
  • About author:

    SHEN Xueli, E-mail:

  • Corresponding author:
    ZHANG Yue, E-mail:

摘要:

【目的/意义】 面对自然田间场景中背景干扰强、病斑多尺度分布,以及边缘设备算力受限等多重挑战,现有的农业计算机视觉应用亟需高效的解决方案。基于改进RT-DETR(Real-Time Detection Transformer)架构,提出了多尺度异构特征协同的棉花叶片病害检测模型(Multi-Scale Heterogeneous Synergetic Feature DETR,MHSF-DETR)。 【方法】 其核心在于重构了特征提取与融合的内在机制。该模型构建了分层上下文选择感知主干网络,在浅层特征阶段嵌入微宏观空间上下文注意力,防止微小病斑细节在深层网络中被淹没;在深层特征阶段集成竞争性选择融合模块,提升语义定位准确性;在特征融合层面,引入可学习加权上下文融合模块,通过动态学习权重解决层级间的语义错位,并结合边界感知重构机制,以反射填充和部分卷积策略有效抑制了边缘伪影,从而在保证特征鲁棒性的同时削减计算开销。 【结果与讨论】 该架构在参数量和计算量分别降低22.42%与13.29%的同时,将平均精度均值提高了3.2个百分点。实现了检测精度与推理效率的协同优化。 【结论】 MHSF-DETR能够有效执行复杂场景下的低功耗检测任务,为构建高效的田间实时监测系统提供了关键技术支撑。

关键词: 棉花病害检测, RT-DETR, 轻量化模型, 注意力机制, 特征融合

Abstract:

[Objective] Detecting cotton leaf diseases in natural fields is difficult because there are many things that can interfere with the picture, the spots on the leaves come in different sizes, and the computers in phones and other small devices have to work very fast. Computer vision is now part of smart agriculture, yet current lightweight models find it hard to get both accurate detections and efficient computing right, especially for spotting small lesions or reducing noise around leaves. To solve these problems, the MHSF-DETR (Multi-Scale Heterogeneous Synergistic Feature DETR) is put forward, which is an improved detection model based on the RT-DETR framework. It aims to realize high-precision, low-power diagnosis in complex agricultural scenarios. [Methods] The primary innovation of this study consisted of the complete reconfiguration of the feature extraction and fusion architectures. Firstly, an Hierarchical Context-Selective Perception Network (HCSP-Net) was constructed as the backbone to replace conventional architectures This backbone employed a differentiated processing strategy tailored to the depth of the feature maps: In the early parts of the process where the features were simple, it used something called M²-SCA (Micro-Macro Spatial Context Attention). This module used a channel semantic filter then a dual stream spatial perception structure to actively capture high frequency textures of micro-lesions and preserve macro semantics so that fine details were not lost when downsampled. At the deep feature stage, a CSF (Competitive Selection Fusion) module was added. Unlike the traditional static summing approach, CSF created a dynamic competition arbitration system that flexibly balances local importance versus overall coherence via soft competition gates, making the semantics sharper and filtering away irrelevant background noise. Secondly, to tackle the spatial and semantic misalignment that was commonly seen in cross-level feature fusions, a Learnable Weighted Context Fusion (LWC-Fusion) module was created inside the neck network. This module used global amplitude dynamic weighting to learn autonomously the best blending ratios, so that deep semantic features were aligned precisely with shallow geometry. Moreover, to solve the problem of artifacts appearing at irregular leaf boundaries caused by traditional zero-padding convolutions, an Edge-Aware Reconstruction Mechanism (EARM) was proposed. By using Edge-Refined Convolution (ER-Conv) and the Edge-Refined Convolution C3 Module (ER-ConvC3), which integrated reflection padding and partial convolution techniques, the model successfully curtailed invalid edge noise and diminished computational redundancy without sacrificing the geometric continuity of features. [Results and Discussions] Empirical benchmarks demonstrated that the proposed MHSF-DETR achieved a superior balance between detection performance and computational efficiency. Compared to the RT-DETR-R18 baseline, MHSF-DETR yielded a significant 3.2 percentage points increase in mean average precision (mAP), while simultaneously reducing parameters by 22.42% and GFLOPs by 13.29%. When benchmarked against mainstream detectors, MHSF-DETR consistently outperformed models such as YOLOv5m, YOLOv10m, and RT-DETR-R50. Although YOLOv8m maintained a marginal mAP50 lead in specific scenarios, its exorbitant computational overhead rendered it less practical for real-time deployment compared to MHSF-DETR. It successfully matched the lean efficiency of YOLOv10m but excelled in detection accuracy. Furthermore, extensive ablation studies confirmed that these performance gains stemmed from the structural synergy among the HCSP-Net backbone, the LWC-Fusion neck, and specialized reconstruction modules, rather than isolated component upgrades. These results validated the effectiveness of MHSF-DETR design in optimizing feature extraction and fusion, offering a highly efficient solution for resource-constrained object detection tasks. [Conclusions] MHSF-DETR resolves the longstanding conflict between accuracy and efficiency in cotton disease monitoring. By combining hierarchical perception with adaptive fusion and edge refinement, the architecture can effectively counteract the dual perils of scale disparity and resource limitations. This work provides a feasible, lightweight template for real-time diagnostics on agricultural edge devices, opening up possibilities for practical application within smart farming ecosystems. Future iterations will expand validation to include other plant structures (bolls, stems) and focus on rigorous field trials on embedded hardware to test real-world robustness.

Key words: cotton disease detection, RT-DETR, lightweight model, attention mechanism, feature fusion

中图分类号: