Object Detection Method of Maize Ears Within Canopy Based on CornYOLO

doi:10.12133/j.smartag.SA202509005

Abstract

Abstract:

[Objective] As a major grain crop, maize plays a critical role in global food security. The ears of maize serves as a key phenotypic trait, providing essential information on the plant's physiological and agronomic status. Its morphological characteristics, size, and color effectively reflect the plant's growth status and potential yield. Therefore, accurately acquiring images of maize ears in the field across different growth stages is crucial for breeding research and yield prediction. Traditional field detection of maize ears relies heavily on manual labor, which is not only inefficient and labor-intensive but also struggles to meet the high-throughput demands of modern precision breeding programs. There is an urgent need for efficient, automated detection technologies that can operate reliably under real-world field conditions. To address the requirement for efficient acquisition of maize ears phenotypic traits in field breeding work, the objective of this research is to develop a robust object detection solution suitable for large-scale field environments. An improved CornYOLO model based on the YOLO11n (You Only Look Once) architecture was designed to enhance the detection accuracy and efficiency of maize ears in complex field environments. [Methods] Image data were acquired using an unmanned ground vehicle (UGV) equipped with a high-resolution panoramic camera, which traversed multiple experimental plots under varying lighting and growth conditions. A dataset containing 1 152 annotated samples was constructed, covering diverse ear morphologies and occlusion scenarios. Dynamic data augmentation techniques were applied during training to enhance the model's generalization capability. Three key enhancements were introduced to the YOLO11n detection framework. First, a cross stage partial network with dynamic pointwise spatial attention (C2PDA) module was designed to replace the cross stage partial with pointwise spatial attention (C2PSA) module in the YOLO11 backbone network. This module enhanced spatial discriminability and channel sensitivity in feature representation through the collaborative integration of a dynamic channel weighting mechanism and position-aware modeling. It significantly improves the model's performance in identifying maize ears under challenging field conditions such as occlusion of stems and leaves and multi-scale target distribution. Second, the spatial pyramid pooling-fast (SPPF) module in the original model was replaced with an feature refinement module (FRM ) to optimize multi-scale feature fusion. The FRM functions via directional feature decomposition and an adaptive attention mechanism. It captures fine-grained spatial structural information through horizontal and vertical bidirectional pooling and combines spatial-channel cooperative attention for dynamic feature calibration, thereby improving recognition accuracy across varying ear sizes and complex backgrounds. Finally, the unified intersection over union (UIoU) loss function was introduced to optimize bounding box regression accuracy. UIoU is an innovative loss function that emphasizes weight allocation among prediction boxes of different qualities. It adaptively adjusted the weight of each prediction box's loss term based on the IoU value or its monotonic function, assigning higher weights to lower-quality predictions to prioritize their optimization, while reducing weights for high-quality boxes to prevent over-optimization. [Results and Discussions] Experimental results demonstrate that CornYOLO achieved a mAP@50 of 89.3% on the validation set, with the F₁-Score increasing by 2.5 percentage points. Compared to widely used lightweight models including YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv10n, real-time detection transformer (RT-DETR) and YOLO13n, CornYOLO showed significantly superior detection performance in complex field environments, with mAP@50 improvements of 2.2, 1.9, 1.8, 5.7, 12.6 and 2.4 percentage points, respectively. These results fully validate that CornYOLO can efficiently and accurately extract maize ear images under field conditions, providing a technical foundation for precise phenotypic evaluation and yield prediction. Furthermore, ablation studies were conducted: Introducing the C2PDA module improved the model's mAP@50 by 0.5 percentage points and the F₁-Score by 0.5 percentage points. However, after incorporating the FRM module, which successfully enhanced multi-scale detection performance and increased the F₁-Score by 1.5 percentage points, the integration of these two modules resulted in the generation of a small number of low-quality detection boxes. The original loss function was inefficient in optimizing such boxes, resulting in no improvement in mAP@50 after the modification. To address this issue, the UIoU loss function was introduced. By dynamically adjusting weight assignments based on prediction quality, it significantly improved the regression performance for low-quality detection boxes, thereby enhancing the localization accuracy and convergence stability of the model in dense target scenarios. The final CornYOLO model exhibited excellent overall performance: Compared to the original YOLO11n, the F₁-Score increased by 2.5 percentage points and mAP@50 improved by 1.1 percentage points. The experimental results fully demonstrate that CornYOLO effectively enhances the detection capability for maize ears in complex field environments compared to the baseline YOLO11n model. [Conclusions] The CornYOLO model proposed in this study incorporates three key components: C2PDA, FRM, and UIoU, which enhances model convergence and localization performance in dense and occluded scenes, enables the model to effectively and precisely identify maize ears under practical conditions, thereby providing reliable technical support for phenotypic analysis and yield prediction in maize breeding. Future work will focus on extending the model to other crop types and further optimizing inference efficiency for real-time deployment on mobile platforms.

Key words: maize ears, unmanned ground vehicle, YOLO11, panoramic camera, object detection

CLC Number:

TP181

GAO Guangfu, WANG Qilei, SONG Liwen, FENG Haikuan, SHI Lei, YANG Hao, LIU Yang, YUE Jibo. Object Detection Method of Maize Ears Within Canopy Based on CornYOLO[J]. Smart Agriculture, 2026, 8(1): 167-177.

Figures/Tables 13

Fig. 1

Fig. 2

Table 1

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 2

Table 3

Fig. 7

Table 4

Fig. 8

Fig. 9

References 38

[1]	AI Y F, JANE J L. Macronutrients in corn and human nutrition[J]. Comprehensive Reviews in Food Science and Food Safety, 2016, 15(3): 581-598.
[2]	AKHTAR M S, ZAFAR Z, NAWAZ R, et al. Unlocking plant secrets: A systematic review of 3D imaging in plant phenotyping techniques[J]. Computers and Electronics in Agriculture, 2024, 222: 109033.
[3]	DARRAH L L, MCMULLEN M D, ZUBER M S. Breeding, genetics and seed corn production[M]// Corn: AACC International Press, 2019: 19-41.
[4]	MIRBOD O, CHOI D, HEINEMANN P H, et al. On-tree apple fruit size estimation using stereo vision with deep learning-based occlusion handling[J]. Biosystems Engineering, 2023, 226: 27-42.
[5]	WANG L L, ZHAO Y J, XIONG Z J, et al. Fast and precise detection of Litchi fruits for yield estimation based on the improved YOLOv5 model[J]. Frontiers in Plant Science, 2022, 13: 965425.
[6]	SARKAR S, OSORIO LEYTON J M, NOA-YARASCA E, et al. Integrating remote sensing and soil features for enhanced machine learning-based corn yield prediction in the southern US[J]. Sensors, 2025, 25(2): 543.
[7]	余兴娇, 樊凯, 霍雪飞, 等. 基于无人机影像多特征融合的夏玉米LAI动态估计[J]. 农业工程学报, 2025, 41(4): 124-134.
	YU X J, FAN K, HUO X F, et al. Dynamic estimation of LAI in summer maize based on multi-feature fusion of UAV images[J]. Transactions of the Chinese Society of Agricultural Engineering, 2025, 41(4): 124-134.
[8]	张晓东, 蔡宗耀, 胡炼, 等. 基于多维成像特征+UGV的设施蔬菜表型参数检测方法[J]. 农业机械学报, 2025, 56(6): 509-517.
	ZHANG X D, CAI Z Y, HU L, et al. Detection method of phenotypic parameters of protected vegetables based on multi-dimensional imaging features +UGV[J]. Transactions of the Chinese Society for Agricultural Machinery, 2025, 56(6): 509-517.
[9]	ZHANG S X, YUE J B, WANG X Y, et al. Segmentation and fractional coverage estimation of soil, illuminated vegetation, and shaded vegetation in corn canopy images using CCSNet and UAV remote sensing[J]. Agriculture, 2025, 15(12): 1309.
[10]	YUE J B, YANG G J, LI C C, et al. Estimation of winter wheat above-ground biomass using unmanned aerial vehicle-based snapshot hyperspectral sensor and crop height improved models[J]. Remote Sensing, 2017, 9(7): 708.
[11]	YUE J B, WANG J, ZHANG Z Y, et al. Estimating crop leaf area index and chlorophyll content using a deep learning-based hyperspectral analysis method[J]. Computers and Electronics in Agriculture, 2024, 227: 109653.
[12]	YUE J B, YANG H, FENG H K, et al. Hyperspectral-to-image transform and CNN transfer learning enhancing soybean LCC estimation[J]. Computers and Electronics in Agriculture, 2023, 211: 108011.
[13]	岳继博, 冷梦蝶, 田庆久, 等. 叶片多理化参数的高光谱遥感与深度学习估算[J]. 光谱学与光谱分析, 2024, 44(10): 2873-2883.
	YUE J B, LENG M D, TIAN Q J, et al. Estimation of leaf physical and chemical parameters based on hyperspectral remote sensing and deep learning technologies[J]. Spectroscopy and Spectral Analysis, 2024, 44(10): 2873-2883.
[14]	WANG N, FU S W, RAO Q, et al. Insect-YOLO: A new method of crop insect detection[J]. Computers and Electronics in Agriculture, 2025, 232: 110085.
[15]	FAN X P, SUN T, CHAI X J, et al. YOLO-WDNet: A lightweight and accurate model for weeds detection in cotton field[J]. Computers and Electronics in Agriculture, 2024, 225: 109317.
[16]	任锐, 孙海霞, 张淑娟, 等. 基于改进YOLOv8n的不同栽培模式下玉露香梨轻量化检测[J]. 农业工程学报, 2025, 41(5): 145-155.
	REN R, SUN H X, ZHANG S J, et al. Lightweight detection method for 'Yuluxiang' pear under different cultivation modes based on improved YOLOv8n[J]. Transactions of the Chinese Society of Agricultural Engineering, 2025, 41(5): 145-155.
[17]	DANG F Y, CHEN D, LU Y Z, et al. YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems[J]. Computers and Electronics in Agriculture, 2023, 205: 107655.
[18]	闫彬, 樊攀, 王美茸, 等. 基于改进YOLOv5m的采摘机器人苹果采摘方式实时识别[J]. 农业机械学报, 2022, 53(9): 28-38, 59.
	YAN B, FAN P, WANG M R, et al. Real-time identification of apple picking mode of picking robot based on improved YOLOv5m[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(9): 28-38, 59.
[19]	JING R, NIU Q L, TIAN Y Y, et al. Sunflower-YOLO: Detection of sunflower capitula in UAV remote sensing images[J]. European Journal of Agronomy, 2024, 160: 127332.
[20]	ALZADJALI A, ALALI M H, VEERANAMPALAYAM SIVAKUMAR A N, et al. Maize tassel detection from UAV imagery using deep learning[J]. Frontiers in Robotics and AI, 2021, 8: 600410.
[21]	QI J T, DING C C, ZHANG R R, et al. UAS-based MT-YOLO model for detecting missed tassels in hybrid maize detasseling[J]. Plant Methods, 2025, 21(1): 21.
[22]	FALAHAT S, KARAMI A. Maize tassel detection and counting using a YOLOv5-based model[J]. Multimedia Tools and Applications, 2023, 82(13): 19521-19538.
[23]	YADAV P K, THOMASSON J A, HARDIN R, et al. AI-driven computer vision detection of cotton in corn fields using UAS remote sensing data and spot-spray application[J]. Remote Sensing, 2024, 16(15): 2754.
[24]	KHAKI S, PHAM H, HAN Y, et al. Convolutional neural networks for image-based corn kernel detection and counting[J]. Sensors, 2020, 20(9): 2721.
[25]	SPRAGUE N, EVANS J, MARDIKES M. Corn ear detection and orientation estimation using deep learning[EB/OL]. arXiv: 2412.14954, 2024.
[26]	赵仲文, 张永立, 韩镇宇, 等. 基于改进的SS-YOLOv8轻量化鲜食玉米果穗优劣检测模型[J]. 农业工程学报, 2025, 41(11): 183-192.
	ZHAO Z W, ZHANG Y L, HAN Z Y, et al. Improved SS-YOLOv8 lightweight ear detection model for fresh corn[J]. Transactions of the Chinese Society of Agricultural Engineering, 2025, 41(11): 183-192.
[27]	FU J, YUAN H K, ZHAO R Q, et al. Peeling damage recognition method for corn ear harvest using RGB image[J]. Applied Sciences, 2020, 10(10): 3371.
[28]	CHEN J, LONG D, YANG S. Research on corn ears defect detection algorithm based on improved YOLOv7[J]. Academic Journal of Engineering and Technology Science, 2024, 7(3): 39-47.
[29]	KHANAM R, HUSSAIN M. YOLOv11: An overview of the key architectural enhancements[EB/OL]. arXiv: 2410.17725, 2024.
[30]	WEI J F, NI L Y, LUO L, et al. GFS-YOLO11: A maturity detection model for multi-variety tomato[J]. Agronomy, 2024, 14(11): 2644.
[31]	LUO X J, CAI Z H, SHAO B, et al. Unified-IoU: For high-quality object detection[EB/OL]. arXiv: 2408.06636, 2024.
[32]	REDMON J, FARHADI A. YOLOv3: An incremental improvement[EB/OL]. arXiv: 1804.02767, 2018.
[33]	XIANG W T, WU D C, WANG J. Enhancing stem localization in precision agriculture: A two-stage approach combining YOLOv5 with EffiStemNet[J]. Computers and Electronics in Agriculture, 2025, 231: 109914.
[34]	XU J S, YANG S Y, LIANG Q, et al. Transillumination imaging for detection of stress cracks in maize kernels using modified YOLOv8 after pruning and knowledge distillation[J]. Computers and Electronics in Agriculture, 2025, 231: 109959.
[35]	WANG A, CHEN H, LIU L H, et al. YOLOv10: Real-time end-to-end object detection[EB/OL]. arXiv: 2405.14458, 2024.
[36]	LEI M Q, LI S Q, WU Y H, et al. YOLOv13: Real-time object detection with hypergraph-enhanced adaptive visual perception[EB/OL]. arXiv: 2506.17733, 2025.
[37]	RESENDE E L, BRUZI A T, SILVA CARDOSO EDA, et al. High-throughput phenotyping: Application in maize breeding[J]. AgriEngineering, 2024, 6(2): 1078-1092.
[38]	WARMAN C, SULLIVAN C M, PREECE J, et al. A cost-effective maize ear phenotyping platform enables rapid categorization and quantification of kernels[J]. The Plant Journal, 2021, 106(2): 566-579.

变量	数值
Fliplr	0.500
hsv_h	0.015
hsv_s	0.700
hsv_v	0.400
Translate	0.100
Scale	0.500

模型	mAP@50/%	R/%	P/%	F ₁
YOLOv3-tiny	87.1	77.5	87.7	82.3
YOLOv5n	87.4	80.2^*	82.9	81.6
YOLOv8n	87.5	76.2	88.1	82.2
YOLOv10n	83.6	74.4	84.3	79.1
YOLO11n	88.2	78.3	86.9	82.4
YOLO13n	86.9	76.2	88.1	81.7
RT-DETR	76.7	71.7	72.9	72.3
CornYOLO	89.3^*	79.5	91.0^*	84.9^*

模型	mAP@50/%	R/%	P/%	F ₁
C2PSA	88.2	78.3	86.9	82.4
C2PSA-CGLU	88.3	78.8	89.3^*	83.8^*
C2PSA-EMA	87.5	79.4	85.8	82.5
C2PSA-Mona	88.6	82.1^*	84.2	83.1
C2PDA	88.7^*	81.5	84.2	82.9

模型				结果
YOLO11n	C2PDA	FRM	UIoU	mAP@50/%	R/%	P/%	F ₁
√	×	×	×	88.2	78.3	86.9	82.4
√	√	×	×	88.7	81.5^*	84.2	82.9
√	×	√	×	87.3	78.2	86.1	82.0
√	×	×	√	87.8	79.2	87.9	83.3
√	√	√	×	88.7	80.4	88.8	84.4
√	√	√	√	89.3^*	79.5	91.0^*	84.9^*