欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2026, Vol. 8 ›› Issue (1): 167-177.doi: 10.12133/j.smartag.SA202509005

• 信息处理与决策 • 上一篇    

基于CornYOLO的冠层内玉米果穗目标检测方法

高光甫1(), 王启磊2, 宋丽雯1, 冯海宽3, 时雷1, 杨浩3, 刘杨1, 岳继博1()   

  1. 1. 河南农业大学 信息与管理科学学院,河南 郑州 450002,中国
    2. 河南金苑种业有限公司,河南 郑州 450002,中国
    3. 北京市农林科学院 信息技术研究中心/农业农村部农业遥感机理与定量遥感重点实验室,北京 100097,中国
  • 收稿日期:2025-09-02 出版日期:2026-01-30
  • 基金项目:
    河南省自然科学基金(252300421839); 国家自然科学基金项目(42101362); 国家自然科学基金项目(42571462)
  • 作者简介:

    高光甫,硕士研究生,研究方向为无人机图像处理。E-mail:

  • 通信作者:
    岳继博,博士,副教授,研究方向为无人机遥感图像分析。E-mail:

Object Detection Method of Maize Ears Within Canopy Based on CornYOLO

GAO Guangfu1(), WANG Qilei2, SONG Liwen1, FENG Haikuan3, SHI Lei1, YANG Hao3, LIU Yang1, YUE Jibo1()   

  1. 1. College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China
    2. Henan Jinyuan Seed Industry Co. , Ltd. , Zhengzhou 450002, China
    3. Key Laboratory of Quantitative Remote Sensing in Agriculture, Ministry of Agriculture and Rural Affairs/ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
  • Received:2025-09-02 Online:2026-01-30
  • Foundation items:Natural Science Foundation of Henan(252300421839); National Natural Science Foundation(42101362)
  • About author:

    GAO Guangfu, E-mail:

  • Corresponding author:
    YUE Jibo, E-mail:

摘要:

[目的/意义] 玉米是主要粮食作物,果穗作为玉米关键表型性状,其形态、大小及颜色特征能够有效反映植株生长状态及潜在产量。传统的田间玉米果穗检测依赖人工,效率低且劳动强度大。随着密植栽培模式的推广,玉米冠层结构愈发密集,人工进入田间开展果穗测量不仅操作困难,还容易对植株造成机械损伤,进一步限制了数据的准确性与代表性。因此,亟需高效的自动化检测技术。 [方法] 为实现复杂田间环境下玉米果穗的高效精准检测,提出一种基于改进YOLO11n(You Only Look Once 11)的CornYOLO模型。创新性地采用无人车搭载全景相机进行图像采集,构建了高质量的田间数据集,并在此基础上提出了3项核心模型改进:1)采用动态点空间注意力的跨阶段部分网络(Cross Stage Partial Network with Dynamic Pointwise Spatial Attention, C2PDA)以提升对遮挡目标的识别鲁棒性;2)引入特征优化模块(Feature Refinement Module, FRM)以增强多尺度目标检测能力;3)使用统一交并比(Unified Intersection Over Union, UIoU)损失函数以优化边界框回归精度。为作物田间表型高通量获取提供了一种从数据采集到智能识别的端到端解决方案。 [结果和讨论] CornYOLO在复杂田间环境下表现出优异的检测性能,在验证集上mAP@50达到89.3%,相较于YOLO11n,F1分数提升2.5个百分点。相较于其余基线模型,其mAP@50提升显著,最高达12.6个百分点。消融实验表明,C2PDA、FRM与UIoU这3个模块均对性能提升有积极贡献,其中C2PDA作用最为关键。 [结论] CornYOLO模型能够高效精准地识别田间玉米果穗,为玉米育种表型分析和产量预测提供可靠的技术支持,推动玉米果穗信息提取的智能化发展。

关键词: 玉米果穗, 无人车, YOLO11, 全景相机, 目标检测

Abstract:

[Objective] As a major grain crop, maize plays a critical role in global food security. The ears of maize serves as a key phenotypic trait, providing essential information on the plant's physiological and agronomic status. Its morphological characteristics, size, and color effectively reflect the plant's growth status and potential yield. Therefore, accurately acquiring images of maize ears in the field across different growth stages is crucial for breeding research and yield prediction. Traditional field detection of maize ears relies heavily on manual labor, which is not only inefficient and labor-intensive but also struggles to meet the high-throughput demands of modern precision breeding programs. There is an urgent need for efficient, automated detection technologies that can operate reliably under real-world field conditions. To address the requirement for efficient acquisition of maize ears phenotypic traits in field breeding work, the objective of this research is to develop a robust object detection solution suitable for large-scale field environments. An improved CornYOLO model based on the YOLO11n (You Only Look Once) architecture was designed to enhance the detection accuracy and efficiency of maize ears in complex field environments. [Methods] Image data were acquired using an unmanned ground vehicle (UGV) equipped with a high-resolution panoramic camera, which traversed multiple experimental plots under varying lighting and growth conditions. A dataset containing 1 152 annotated samples was constructed, covering diverse ear morphologies and occlusion scenarios. Dynamic data augmentation techniques were applied during training to enhance the model's generalization capability. Three key enhancements were introduced to the YOLO11n detection framework. First, a cross stage partial network with dynamic pointwise spatial attention (C2PDA) module was designed to replace the cross stage partial with pointwise spatial attention (C2PSA) module in the YOLO11 backbone network. This module enhanced spatial discriminability and channel sensitivity in feature representation through the collaborative integration of a dynamic channel weighting mechanism and position-aware modeling. It significantly improves the model's performance in identifying maize ears under challenging field conditions such as occlusion of stems and leaves and multi-scale target distribution. Second, the spatial pyramid pooling-fast (SPPF) module in the original model was replaced with an feature refinement module (FRM ) to optimize multi-scale feature fusion. The FRM functions via directional feature decomposition and an adaptive attention mechanism. It captures fine-grained spatial structural information through horizontal and vertical bidirectional pooling and combines spatial-channel cooperative attention for dynamic feature calibration, thereby improving recognition accuracy across varying ear sizes and complex backgrounds. Finally, the unified intersection over union (UIoU) loss function was introduced to optimize bounding box regression accuracy. UIoU is an innovative loss function that emphasizes weight allocation among prediction boxes of different qualities. It adaptively adjusted the weight of each prediction box's loss term based on the IoU value or its monotonic function, assigning higher weights to lower-quality predictions to prioritize their optimization, while reducing weights for high-quality boxes to prevent over-optimization. [Results and Discussions] Experimental results demonstrate that CornYOLO achieved a mAP@50 of 89.3% on the validation set, with the F1-Score increasing by 2.5 percentage points. Compared to widely used lightweight models including YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv10n, real-time detection transformer (RT-DETR) and YOLO13n, CornYOLO showed significantly superior detection performance in complex field environments, with mAP@50 improvements of 2.2, 1.9, 1.8, 5.7, 12.6 and 2.4 percentage points, respectively. These results fully validate that CornYOLO can efficiently and accurately extract maize ear images under field conditions, providing a technical foundation for precise phenotypic evaluation and yield prediction. Furthermore, ablation studies were conducted: Introducing the C2PDA module improved the model's mAP@50 by 0.5 percentage points and the F1-Score by 0.5 percentage points. However, after incorporating the FRM module, which successfully enhanced multi-scale detection performance and increased the F1-Score by 1.5 percentage points, the integration of these two modules resulted in the generation of a small number of low-quality detection boxes. The original loss function was inefficient in optimizing such boxes, resulting in no improvement in mAP@50 after the modification. To address this issue, the UIoU loss function was introduced. By dynamically adjusting weight assignments based on prediction quality, it significantly improved the regression performance for low-quality detection boxes, thereby enhancing the localization accuracy and convergence stability of the model in dense target scenarios. The final CornYOLO model exhibited excellent overall performance: Compared to the original YOLO11n, the F1-Score increased by 2.5 percentage points and mAP@50 improved by 1.1 percentage points. The experimental results fully demonstrate that CornYOLO effectively enhances the detection capability for maize ears in complex field environments compared to the baseline YOLO11n model. [Conclusions] The CornYOLO model proposed in this study incorporates three key components: C2PDA, FRM, and UIoU, which enhances model convergence and localization performance in dense and occluded scenes, enables the model to effectively and precisely identify maize ears under practical conditions, thereby providing reliable technical support for phenotypic analysis and yield prediction in maize breeding. Future work will focus on extending the model to other crop types and further optimizing inference efficiency for real-time deployment on mobile platforms.

Key words: maize ears, unmanned ground vehicle, YOLO11, panoramic camera, object detection

中图分类号: