Welcome to Smart Agriculture 中文

Smart Agriculture

   

Multi-Scale Heterogeneous Feature Synergistic Model for Cotton Leaf Disease Detection

SHEN Xueli, ZHANG Yue(), JIN Haibo, ZHANG Xuxu   

  1. School of Software, Liaoning Technical University, Huludao 125105, China
  • Received:2026-01-22 Online:2026-03-13
  • Foundation items:National Natural Science Foundation of China(62173171)
  • About author:

    SHEN Xueli, E-mail:

  • corresponding author:
    ZHANG Yue, E-mail:

Abstract:

[Objective] Detecting cotton leaf diseases in natural fields is difficult because there are many things that can interfere with the picture, the spots on the leaves come in different sizes, and the computers in phones and other small devices have to work very fast. Computer vision is now part of smart agriculture, yet current lightweight models find it hard to get both accurate detections and efficient computing right, especially for spotting small lesions or reducing noise around leaves. To solve these problems, the MHSF-DETR (Multi-Scale Heterogeneous Synergistic Feature DETR) is put forward, which is an improved detection model based on the RT-DETR framework. It aims to realize high-precision, low-power diagnosis in complex agricultural scenarios. [Methods] The primary innovation of this study consisted of the complete reconfiguration of the feature extraction and fusion architectures. Firstly, an Hierarchical Context-Selective Perception Network (HCSP-Net) was constructed as the backbone to replace conventional architectures This backbone employed a differentiated processing strategy tailored to the depth of the feature maps: In the early parts of the process where the features were simple, it used something called M²-SCA (Micro-Macro Spatial Context Attention). This module used a channel semantic filter then a dual stream spatial perception structure to actively capture high frequency textures of micro-lesions and preserve macro semantics so that fine details were not lost when downsampled. At the deep feature stage, a CSF (Competitive Selection Fusion) module was added. Unlike the traditional static summing approach, CSF created a dynamic competition arbitration system that flexibly balances local importance versus overall coherence via soft competition gates, making the semantics sharper and filtering away irrelevant background noise. Secondly, to tackle the spatial and semantic misalignment that was commonly seen in cross-level feature fusions, a Learnable Weighted Context Fusion (LWC-Fusion) module was created inside the neck network. This module used global amplitude dynamic weighting to learn autonomously the best blending ratios, so that deep semantic features were aligned precisely with shallow geometry. Moreover, to solve the problem of artifacts appearing at irregular leaf boundaries caused by traditional zero-padding convolutions, an Edge-Aware Reconstruction Mechanism (EARM) was proposed. By using Edge-Refined Convolution (ER-Conv) and the Edge-Refined Convolution C3 Module (ER-ConvC3), which integrated reflection padding and partial convolution techniques, the model successfully curtailed invalid edge noise and diminished computational redundancy without sacrificing the geometric continuity of features. [Results and Discussions] Empirical benchmarks demonstrated that the proposed MHSF-DETR achieved a superior balance between detection performance and computational efficiency. Compared to the RT-DETR-R18 baseline, MHSF-DETR yielded a significant 3.2 percentage points increase in mean average precision (mAP), while simultaneously reducing parameters by 22.42% and GFLOPs by 13.29%. When benchmarked against mainstream detectors, MHSF-DETR consistently outperformed models such as YOLOv5m, YOLOv10m, and RT-DETR-R50. Although YOLOv8m maintained a marginal mAP50 lead in specific scenarios, its exorbitant computational overhead rendered it less practical for real-time deployment compared to MHSF-DETR. It successfully matched the lean efficiency of YOLOv10m but excelled in detection accuracy. Furthermore, extensive ablation studies confirmed that these performance gains stemmed from the structural synergy among the HCSP-Net backbone, the LWC-Fusion neck, and specialized reconstruction modules, rather than isolated component upgrades. These results validated the effectiveness of MHSF-DETR design in optimizing feature extraction and fusion, offering a highly efficient solution for resource-constrained object detection tasks. [Conclusions] MHSF-DETR resolves the longstanding conflict between accuracy and efficiency in cotton disease monitoring. By combining hierarchical perception with adaptive fusion and edge refinement, the architecture can effectively counteract the dual perils of scale disparity and resource limitations. This work provides a feasible, lightweight template for real-time diagnostics on agricultural edge devices, opening up possibilities for practical application within smart farming ecosystems. Future iterations will expand validation to include other plant structures (bolls, stems) and focus on rigorous field trials on embedded hardware to test real-world robustness.

Key words: cotton disease detection, RT-DETR, lightweight model, attention mechanism, feature fusion

CLC Number: