Multi-Scale Heterogeneous Feature Synergistic Model for Cotton Leaf Disease Detection

doi:10.12133/j.smartag.SA202601027

Abstract

Abstract:

[Objective] Detecting cotton leaf diseases in natural fields is difficult because there are many things that can interfere with the picture, the spots on the leaves come in different sizes, and the computers in phones and other small devices have to work very fast. Computer vision is now part of smart agriculture, yet current lightweight models find it hard to get both accurate detections and efficient computing right, especially for spotting small lesions or reducing noise around leaves. To solve these problems, the MHSF-DETR (Multi-Scale Heterogeneous Synergistic Feature DETR) is put forward, which is an improved detection model based on the RT-DETR framework. It aims to realize high-precision, low-power diagnosis in complex agricultural scenarios. [Methods] The primary innovation of this study consisted of the complete reconfiguration of the feature extraction and fusion architectures. Firstly, an Hierarchical Context-Selective Perception Network (HCSP-Net) was constructed as the backbone to replace conventional architectures This backbone employed a differentiated processing strategy tailored to the depth of the feature maps: In the early parts of the process where the features were simple, it used something called M²-SCA (Micro-Macro Spatial Context Attention). This module used a channel semantic filter then a dual stream spatial perception structure to actively capture high frequency textures of micro-lesions and preserve macro semantics so that fine details were not lost when downsampled. At the deep feature stage, a CSF (Competitive Selection Fusion) module was added. Unlike the traditional static summing approach, CSF created a dynamic competition arbitration system that flexibly balances local importance versus overall coherence via soft competition gates, making the semantics sharper and filtering away irrelevant background noise. Secondly, to tackle the spatial and semantic misalignment that was commonly seen in cross-level feature fusions, a Learnable Weighted Context Fusion (LWC-Fusion) module was created inside the neck network. This module used global amplitude dynamic weighting to learn autonomously the best blending ratios, so that deep semantic features were aligned precisely with shallow geometry. Moreover, to solve the problem of artifacts appearing at irregular leaf boundaries caused by traditional zero-padding convolutions, an Edge-Aware Reconstruction Mechanism (EARM) was proposed. By using Edge-Refined Convolution (ER-Conv) and the Edge-Refined Convolution C3 Module (ER-ConvC3), which integrated reflection padding and partial convolution techniques, the model successfully curtailed invalid edge noise and diminished computational redundancy without sacrificing the geometric continuity of features. [Results and Discussions] Empirical benchmarks demonstrated that the proposed MHSF-DETR achieved a superior balance between detection performance and computational efficiency. Compared to the RT-DETR-R18 baseline, MHSF-DETR yielded a significant 3.2 percentage points increase in mean average precision (mAP), while simultaneously reducing parameters by 22.42% and GFLOPs by 13.29%. When benchmarked against mainstream detectors, MHSF-DETR consistently outperformed models such as YOLOv5m, YOLOv10m, and RT-DETR-R50. Although YOLOv8m maintained a marginal mAP50 lead in specific scenarios, its exorbitant computational overhead rendered it less practical for real-time deployment compared to MHSF-DETR. It successfully matched the lean efficiency of YOLOv10m but excelled in detection accuracy. Furthermore, extensive ablation studies confirmed that these performance gains stemmed from the structural synergy among the HCSP-Net backbone, the LWC-Fusion neck, and specialized reconstruction modules, rather than isolated component upgrades. These results validated the effectiveness of MHSF-DETR design in optimizing feature extraction and fusion, offering a highly efficient solution for resource-constrained object detection tasks. [Conclusions] MHSF-DETR resolves the longstanding conflict between accuracy and efficiency in cotton disease monitoring. By combining hierarchical perception with adaptive fusion and edge refinement, the architecture can effectively counteract the dual perils of scale disparity and resource limitations. This work provides a feasible, lightweight template for real-time diagnostics on agricultural edge devices, opening up possibilities for practical application within smart farming ecosystems. Future iterations will expand validation to include other plant structures (bolls, stems) and focus on rigorous field trials on embedded hardware to test real-world robustness.

Key words: cotton disease detection, RT-DETR, lightweight model, attention mechanism, feature fusion

CLC Number:

TP391.4
S435

SHEN Xueli, ZHANG Yue, JIN Haibo, ZHANG Xuxu. Multi-Scale Heterogeneous Feature Synergistic Model for Cotton Leaf Disease Detection[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202601027.

Figures/Tables 16

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Table 1

Table 2

Table 3

Table 4

Table 5

Table 6

Fig. 9

Fig. 10

References 26

[1]	赵卫松, 郭庆港, 鹿秀云, 等. 中国棉花主要病虫害农药登记现状及存在问题与展望[J/OL]. 农药学学报. [2025-12-30].
	ZHAO W S, GUO Q G, LU X Y, et al. Current status, problems and prospects of pesticide registration for major cotton pests and diseases in China[J/OL]. Chinese Journal of Pesticide Science. [2025-12-30].
[2]	魏梦婷. 中国棉花国际竞争力及影响因素分析[C]// 高质量伙伴关系与全球可持续发展论文集(下). 2022: 197-205. DOI:10.26914/c.cnkihy.2022.079923 .
	WEI M T. Analysis on international competitiveness and influencing factors of Chinese cotton [C]// Proceedings of High-quality Partnership and Global Sustainable Development (Volume II). 2022: 197-205. DOI:10.26914/c.cnkihy.2022.079923 .
[3]	翟肇裕, 曹益飞, 徐焕良, 等. 农作物病虫害识别关键技术研究综述[J]. 农业机械学报, 2021, 52(7): 1-18.
	ZHAI Z Y, CAO Y F, XU H L, et al. Review of key techniques for crop disease and pest detection[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(7): 1-18.
[4]	王晋伟, 赵丽红, 师勇强, 等. 棉花病害全程防治技术研究初报[J]. 中国棉花, 2020, 47(5): 20-22, 46.
	WANG J W, ZHAO L H, SHI Y Q, et al. Preliminary report on the whole process control techniques of cotton diseases[J]. China Cotton, 2020, 47(5): 20-22, 46.
[5]	曹冰雪, 赵春江, 李瑾, 等. 中国智慧农业技术发展现状、挑战与展望[J]. 农业工程学报, 2025, 41(21): 1-10.
	CAO B X, ZHAO C J, LI J, et al. Current status, challenges and prospects of smart agriculture technology development in China[J]. Transactions of the Chinese Society of Agricultural Engineering, 2025, 41(21): 1-10.
[6]	赵法川, 徐晓辉, 宋涛, 等. 融合多头注意力的轻量级作物病虫害识别[J]. 华南农业大学学报, 2023, 44(6): 986-994.
	ZHAO F C, XU X H, SONG T, et al. A lightweight crop pest identification method based on multi-head attention[J]. Journal of South China Agricultural University, 2023, 44(6): 986-994.
[7]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2014: 580-587.
[8]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[9]	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 2980-2988.
[10]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Computer Vision – ECCV 2016. Cham, Germany: Springer, 2016: 21-37.
[11]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2016: 779-788.
[12]	龚昌智, 郭丹丹. 基于深度学习的番茄叶片病害检测研究[J]. 现代农业科技, 2025(10): 159-164.
	GONG C Z, GUO D D. Tomato leaf disease detection based on deep learning[J]. XianDai NongYe KeJi, 2025(10): 159-164.
[13]	王俏, 张彪, 刘鑫. 基于改进行锚分类的快速葡萄叶片病害检测算法[J]. 江苏农业科学, 2024, 52(23): 206-213.
	WANG Q, ZHANG B, LIU X.. Rapid grape leaf disease detection algorithm based on modified anchor classification[J]. Jiangsu Agricultural Sciences, 2024, 52(23): 206-213.
[14]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. arXiv: 1706.03762, 2017.
[15]	ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection[EB/OL]. arXiv: 2010.04159, 2020.
[16]	DAI Z G, CAI B L, LIN Y G, et al. Unsupervised pre-training for detection transformers[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12772-12782.
[17]	ZHAO Y A, LYU W Y, XU S L, et al. DETRs beat YOLOs on real-time object detection[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2024: 16965-16974.
[18]	FU Z L, YIN L F, CUI C, et al. A lightweight MHDI-DETR model for detecting grape leaf diseases[J]. Frontiers in Plant Science, 2024, 15: 1499911.
[19]	XIN D Y, LI T Q. Revolutionizing tomato disease detection in complex environments[J]. Frontiers in Plant Science, 2024, 15: 1409544.
[20]	WU M Y, QIU Y, WANG W Y, et al. Improved RT-DETR and its application to fruit ripeness detection[J]. Frontiers in Plant Science, 2025, 16: 1423682.
[21]	DetectionDisease. Cotton disease detection dataset[DB/OL]. Roboflow Universe, 2024. [2026-01-11].
[22]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2021: 13708-13717.
[23]	GE X, ZHU Y, QI L P, et al. Enhancing border learning for better image denoising[J]. Mathematics, 2025, 13(7): 1119.
[24]	LIU G L, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]// Computer Vision – ECCV 2018. Cham, Germany: Springer, 2018: 89-105.
[25]	李江, 骆炜, 陈豪, 等. 基于改进RT-DETR的PCBA管脚焊点缺陷检测方法[J]. 液晶与显示, 2025, 40(10): 1532-1544.
	LI J, LUO W, CHEN H, et al. PCBA pin solder defect detection method based on improved RT-DETR[J]. Chinese Journal of Liquid Crystals and Displays, 2025, 40(10): 1532-1544.
[26]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359.

参数名称	参数数值
输入图像尺寸	640×640
初始学习率	0.000 1
权重衰减	0.000 5
批次大小	4
总训练轮次	200
优化器	AdamW

实验	LWC-Fusion	EARM	HCSP-Net	准确率/%	召回率/%	mAP50/%	参数量/M	GFLOPs
1	×	×	×	88.6	76.1	79.3	19.89	57.2
2	√	×	×	86.4	75.5	79.8	20.44	57.6
3	×	√	×	85.8	78.2	80.2	19.13	52.5
4	×	×	√	84.8	77.8	81.1	16.00	55.0
5	√	√	×	87.1	78.9	80.9	19.66	53.2
6	√	×	√	88.2	78.5	81.8	16.48	55.4
7	×	√	√	88.4	79.2	82.1	15.01	49.4
8	√	√	√	90.2	79.6	82.5	15.43	49.6

算法	准确率/%	召回率/%	mAP50/%	参数量/M	GFLOPs
YOLOv5m	87.7	77.4	81.9	21.32	49.2
YOLOv8m	89.1	78.2	83.1	25.84	78.7
YOLOv10m	85.3	74.5	79.6	15.41	59.3
RT-DETR-R18	88.6	76.1	79.3	19.89	57.2
RT-DETR-R50	90.3	80.5	83.8	42.65	110.5
SSD	78.5	68.2	74.3	26.28	62.4
Faster R-CNN	81.3	76.9	80.1	136.02	358.5
EdgeNeXt-B	82.4	71.5	76.9	18.51	3.84
MobileViT-S	76.5	64.2	70.8	5.63	2.03
MHSF-DETR	90.2	79.6	82.5	15.43	49.6

算法	准确率/%	召回率/%	mAP50/%	参数量/M	GFLOPs
YOLOv5m	84.2	74.1	78.5	21.32	49.2
YOLOv8m	86.5	75.3	80.1	25.84	78.7
YOLOv10m	82.1	71.8	78.4	15.41	59.3
RT-DETR-R18	85.4	76.8	76.8	19.89	57.2
RT-DETR-R50	87.2	80.5	80.7	42.65	110.5
SSD	75.2	70.1	70.2	26.28	62.4
Faster R-CNN	78.6	76.5	76.5	136.02	358.5
EdgeNeXt-B	80.2	69.6	74.1	18.51	3.84
MobileViT-S	74.5	62.8	68.6	5.63	2.03
MHSF-DETR	87.8	76.9	80.3	15.43	49.6

算法	准确率/%	召回率/%	mAP50/%	参数量/M	GFLOPs
YOLOv5m	89.5	78.4	84.2	21.32	49.2
YOLOv8m	92.2	82.5	89.6	25.84	78.7
YOLOv10m	92.8	81.9	88.1	15.41	59.3
RT-DETR-R18	88.2	77.5	82.5	19.89	57.2
RT-DETR-R50	93.5	83.1	88.6	42.65	110.5
SSD	82.4	72.1	76.8	26.28	62.4
Faster R-CNN	87.5	80.2	81.9	136.02	358.5
EdgeNeXt-B	83.2	73.3	78.5	18.51	3.84
MobileViT-S	80.5	69.4	73.1	5.63	2.03
MHSF-DETR	92.5	81.1	86.9	15.43	49.6