欢迎您访问《智慧农业(中英文)》官方网站! English
专题--农业知识智能服务和智慧无人农场(上)

基于改进YOLOv8的苗期玉米行检测方法

  • 李洪波 , 1, 2 ,
  • 田鑫 , 1, 2 ,
  • 阮志文 1, 2 ,
  • 刘少文 1, 2 ,
  • 任玮琪 1, 2 ,
  • 苏中滨 , 1, 2 ,
  • 高睿 1, 2 ,
  • 孔庆明 1, 2
展开
  • 1. 东北农业大学 电气与信息学院,黑龙江 哈尔滨 150030,中国
  • 2. 黑龙江省农业农村部东北智慧农业技术重点实验室,黑龙江 哈尔滨 150030,中国

李洪波和田鑫对本文有同等贡献,并列第一作者。

李洪波,硕士,助教,研究方向为智能视觉感知。E-mail:
苏中滨,博士,教授,研究方向为智慧农业。E-mail:

田 鑫,研究方向为智能视觉感知。E-mail:

收稿日期: 2024-08-13

  网络出版日期: 2024-11-28

Seedling Stage Corn Line Detection Method Based on Improved YOLOv8

  • LI Hongbo , 1, 2 ,
  • TIAN Xin , 1, 2 ,
  • RUAN Zhiwen 1, 2 ,
  • LIU Shaowen 1, 2 ,
  • REN Weiqi 1, 2 ,
  • SU Zhongbin , 1, 2 ,
  • GAO Rui 1, 2 ,
  • KONG Qingming 1, 2
Expand
  • 1. Institutions of Electrical and Information, Northeast Agricultural University, Harbin 150030, China
  • 2. Key Laboratory of Northeast Smart Agricultural Technology, Ministry of Agriculture and Rural Affairs, Heilongjiang Province, Harbin 150030, China

LI Hongbo and TIAN Xin contributed equally to this work

LI Hongbo, E-mail: ;
SU Zhongbin, E-mail:

TIAN Xin, E-mail:

Received date: 2024-08-13

  Online published: 2024-11-28

Supported by

The National Science and Technology Innovation 2030 of New Generation of Artificial Intelligence Major Project(2021ZD0110904)

Copyright

copyright©2024 by the authors

摘要

[目的/意义] 智能农机是田间机器人发展的新趋势。作物行提取是智能农机自主作业的重要环节,对于提高田间作业效率、减少作物损害、优化资源利用具有重要意义。然而,在复杂的田间环境中,如强烈的光照和杂草干扰,传统的作物行检测方法往往难以达到高精度和高效率。为了应对这些挑战,本研究旨在提高无人农机在复杂光照和杂草干扰下的苗期玉米行检测精度与效率,从而减少作物损害。 [方法] 提出一种基于YOLOv8-G的作物行检测方法,结合了YOLOv8-G目标检测算法、亲和传播聚类算法,以及最小二乘法。YOLOv8-G是在YOLOv8和GhostNetV2基础上改进的轻量级目标检测算法,通过提取玉米苗的中心点位置,利用亲和传播算法进行聚类分析,并通过最小二乘法拟合作物行。[结果与讨论] YOLOV8-G算法在玉米苗期的7天、14天和21天时的平均准确率(Average Precision, AP)分别为98.22%、98.15%和97.32%。该算法在玉米苗期的作物行提取准确率达到96.52%。相比传统检测方法,YOLOv8-G在处理复杂背景和强光照条件下表现更为优异,且计算效率有一定提升。 [结论] 提出的基于YOLOv8-G的作物行检测方法能够在复杂光照条件和杂草干扰下快速准确地识别田间作物并模拟协同目标行,不仅为无人农机的自动导航提供有力支持,还能高效适配嵌入式设备,在提升农业自动化、减少人工操作和降低作物损害的同时,为智能农机的实时作业提供技术保障,具有重要的应用价值。

本文引用格式

李洪波 , 田鑫 , 阮志文 , 刘少文 , 任玮琪 , 苏中滨 , 高睿 , 孔庆明 . 基于改进YOLOv8的苗期玉米行检测方法[J]. 智慧农业, 2024 , 6(6) : 72 -84 . DOI: 10.12133/j.smartag.SA202408008

Abstract

[Objective] Crop line extraction is critical for improving the efficiency of autonomous agricultural machines in the field. However, traditional detection methods struggle to maintain high accuracy and efficiency under challenging conditions, such as strong light exposure and weed interference. The aims are to develop an effective crop line extraction method by combining YOLOv8-G, Affinity Propagation, and the Least Squares method to enhance detection accuracy and performance in complex field environments. [Methods] The proposed method employs machine vision techniques to address common field challenges. YOLOv8-G, an improved object detection algorithm that combines YOLOv8 and GhostNetV2 for lightweight, high-speed performance, was used to detect the central points of crops. These points were then clustered using the Affinity Propagation algorithm, followed by the application of the Least Squares method to extract the crop lines. Comparative tests were conducted to evaluate multiple backbone networks within the YOLOv8 framework, and ablation studies were performed to validate the enhancements made in YOLOv8-G. [Results and Discussions] The performance of the proposed method was compared with classical object detection and clustering algorithms. The YOLOv8-G algorithm achieved average precision (AP) values of 98.22%, 98.15%, and 97.32% for corn detection at 7, 14, and 21 days after emergence, respectively. Additionally, the crop line extraction accuracy across all stages was 96.52%. These results demonstrate the model's ability to maintain high detection accuracy despite challenging conditions in the field. [Conclusions] The proposed crop line extraction method effectively addresses field challenges such as lighting and weed interference, enabling rapid and accurate crop identification. This approach supports the automatic navigation of agricultural machinery, offering significant improvements in the precision and efficiency of field operations.

0 Introduction

Accurate corn line detection is essential for the efficient operation of unmanned agricultural machinery. It helps to reduce crop damage, increases field utilization, and significantly lowers agricultural production costs[1]. However, accurate crop row detection remains a significant challenge due to varying field conditions and the presence of weeds[2]. Although agricultural machines based on positioning terminals are capable of conducting operations on large-scale farms, such machines may inadvertently result in seedling injuries owing to several factors such as the actual scenario of the crop planting differing from the planned navigation line. Existing methods for detecting corn lines often perform poorly under strong light conditions and struggle to differentiate between crops and weeds[3]. Therefore, it is necessary to install low-cost visual sensors on the agricultural machinery which can provide more accurate and applicable visual cues for detection and navigation in the field.
The detection of crop lines is mainly divided into two parts. First, the feature points of crops are extracted, and then the crop line is determined based on the feature points of crops. A variety of algorithms have attempted the task of crop line extraction[4-6]. Although the traditional image processing method is simple and quick, the feature points extracted under strong light are easily disturbed, and distinguishing between crops and weeds is difficult. In recent years, the object detection algorithm has also been applied in crop row detection. Ponnambalam et al.[7] used convolutional neural networks to extract crop row feature points and introduced an adaptive multi-ROI method to address the uneven contours of crop rows in hilly terrains, followed by linear regression for crop line detection. This highlights the effectiveness of CNN-based crop row detection and adaptive trajectory fitting in precision agriculture. Liu et al.[8] used the single shot multibox detector (SSD) target detection algorithm to obtain crop feature points, and combined Mask R-CNN with the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm to accurately segment overlapping seedling leaves under heavy metal stress, and finally applied the Least Squares method to detect crop lines. De Silva et al.[9] presented a deep convolutional encoder-decoder network for predicting crop line masks using RGB images. The algorithm outperformed baseline methods in both crop row detection and visual servoing-based navigation in realistic field scenarios. De Silva et al.[10] proposed a deep learning-based semantic segmentation technique for identifying and extracting the position and shape of crop lines. The deep convolutional encoder-decoder network effectively predicted crop row masks from RGB images and demonstrated robustness against shadows and varying growth stages of crops. Although many mainstream target detection methods demonstrate high accuracy and can effectively distinguish between crops and weeds in complex backgrounds, they face significant limitations when directly deployed on agricultural machinery due to their large model size and high computational requirements. This constraint makes it challenging to implement these methods in embedded systems with limited processing power, which is typical for most agricultural machinery. Additionally, these models often require substantial energy consumption, which can be impractical in field operations where power efficiency is essential.
Previous crop row detection methods typically focus on specific crop growth stages, which limits their adaptability and robustness across different growth periods and environmental conditions[11, 12]. These methods are also highly sensitive to lighting variations, such as direct sunlight, shadows, and low-light scenarios, making it difficult to accurately distinguish between crops and weeds under diverse field conditions. Moreover, while these methods prioritize accuracy, they often overlook efficiency, which is a critical factor for real-time applications in autonomous agricultural machinery. Without the right balance between speed and precision, the usability of such systems for dynamic and continuous field operations remains restricted[13, 14]. This study introduces an innovative approach by developing the lightweight YOLOv8-G detection algorithm, optimized for agricultural scenarios. It uses GhostNetV2 as the backbone to reduce parameters and computational complexity while maintaining high accuracy across multiple crop growth stages. The algorithm incorporates the decoupled fully connected (DFC) attention mechanism and Focal Loss to improve detection in challenging lighting conditions and efficiently handle class imbalances. These innovations make the YOLOv8-G model highly adaptable, providing faster and more accurate crop row detection, particularly at the seeding stage, supporting autonomous navigation and reducing crop damage.

1 Materials and methods

The overall process of the proposed YOLOv8-G-based crop row detection approach is shown in Fig. 1. First, corn crop images were obtained by capturing videos in a corn field, and individual frames (screenshots) were extracted from the videos for further analysis, and LabelImg was used to label the dataset. Feature points were extracted from corn using the YOLOv8-G target detection algorithm. To evaluate performance, various object detection algorithms were compared, including YOLOv8, YOLOv7, SSD[15], faster region-based convolutional neural networks (Faster R-CNN)[16], and YOLOv4[17]. Each model was assessed using key performance metrics such as average precision (AP), frames per second (FPS), and giga floating point operations per second (GFLOPS). The PyTorch framework was used to train the network, and the *.pth format model was obtained. After obtaining the characteristic points of corn crops, the affinity propagation clustering algorithm was used to obtain the number of corn rows and detect the distribution of corn seedlings in each row. Comparative experiments were conducted using the DBSCAN algorithm and the ordering points to identify the clustering structure (OPTICS) clustering algorithm. Finally, the least squares method is used to detect the crop rows. YOLOv8 backbone network was replaced by visual geometry group (VGG)[18], ResNet[19], MobileNetV1[20], MobileNetV2[21], MobileNetV3[22], GhostNet[23], and GhostNetV2[24], respectively. Comparative experiments were conducted, and the best-performing method was selected as the YOLOv8-G backbone network.
Fig. 1 Diagram of the overall workflow of corn seedling stage object detection and crop row recognition study

1.1 Data acquisition

A corn dataset was utilized for crop row detection, and four datasets were created to capture the crop's growth under different conditions. Data was collected within 21 days after corn seedlings emerge because this stage is a critical period for inter-row detection of crops, when weeds compete most fiercely with seedlings and detection needs are highest. After 21 days, the crop root system is well developed, the plant coverage area is larger, the ability to naturally suppress weeds is enhanced, and the need for inter-row detection decreases accordingly. Data 1 was collected 7 days after corn emergence in the early morning, where lighting was relatively soft and conditions were even. Data 2 was collected 14 days after emergence under strong midday sunlight, creating more complex lighting conditions that could potentially affect crop detection. Data 3 was collected 21 days after emergence in the evening, where lighting gradually decreased, representing low-light conditions. The data of the three periods are shown in Fig. 2. These varying environmental and lighting conditions were used to evaluate the model's performance across different stages of crop development, testing its adaptability to changes in light intensity. Additionally, the weed complexity gradually increased across these datasets, and this variation allowed for the evaluation of the model's robustness in handling different levels of weed interference.
Fig. 2 Data of three periods of corn seedling stage object detection and crop row recognition study
The fourth dataset, Data 4, is a comprehensive dataset that combines all images from Data 1, Data 2, and Data 3. It provides a more holistic evaluation of the model's robustness and effectiveness across multiple growth stages and varying light conditions, ensuring that the model can generalize well under diverse field scenarios. The image data was collected from Acheng Farm, Acheng district, Harbin city, Heilongjiang province in May and June 2023, at a height of 1.5 m. Choosing to shoot at a fixed height (1.5 m) because this height is usually one of the standard heights for installing unmanned agricultural machinery cameras, especially on small and medium-sized agricultural machinery, which can provide a better viewing angle and coverage.
As the agricultural machinery traverses the field, there may be bumps and changes in angles. To better simulate the actual testing environment of agricultural machinery under different working conditions and improve stability, four angles were used for each period of data, with a depression angle of 60°,70°,80° and 90°, respectively. Fig. 3 shows the data from four different viewing angles. The number of images collected across different angles for each dataset is summarized in Table 1, which shows the distribution of data collected for early morning, midday, and evening conditions. The diversity of these angles can simulate different agricultural machinery operation scenarios and improve the robustness and applicability of the model under various operating conditions.
Fig. 3 Data of four different viewing angles of corn seedling stage object detection and crop row recognition
Table 1 Summary of corn dataset images collected at different angles
Data Collection time Number of images with a depression angle of 60° Number of images with a depression angle of 70° Number of images with a depression angle of 80° Number of images with a depression angle of 90° Total number of images
Data1 7 days after corn emergence 100 100 100 100 400
Data2 14 days after corn emergence 100 100 100 100 400
Data3 21 days after corn emergence 100 100 100 100 400
Data4 Contains Data 1, Data 2, Data 3 300 300 300 300 1 200
The images were captured using a Canon EOS 80D camera with the following settings: ISO 400 for morning data, ISO 100 for strong light, and ISO 800 for evening data. The shutter speed was set to 1/250 s, and the aperture was f/5.6.
The pictures were partitioned into training, validation, and test sets in an 8:1:1 ratio. The training set was used to facilitate model training, the validation set assisted in model optimization and hyperparameter selection, and the test set evaluated model performance.

1.2 Using YOLOv8-G to obtain crop feature points

The extraction of crop feature points is critical as it directly affects the accuracy of crop line fitting. In real-world agricultural environments, unmanned agricultural machinery faces significant challenges, including varying lighting conditions and interference from weeds[25]. Accurate crop feature point extraction under these conditions is essential for reliable crop line detection and autonomous navigation. Therefore, this study focuses on developing a robust detection method that can handle such complexities while operating within the limited computational capacity of embedded agricultural systems. Target detection methods can accurately distinguish weeds from corn and obtain the coordinates of the corn, but the computational power of embedded agricultural machinery is very limited. Therefore, many mainstream target detection algorithms cannot be directly applied. To address this, YOLOv8-G, a lightweight object detection algorithm was proposed based on YOLOv8-m.

1.2.1 YOLOv8-G overall structure

To make the model more suitable for crop feature point extraction in embedded agricultural systems, the YOLOv8-G approach replaces the original backbone with the lightweight GhostNetV2, which reduced computational complexity. Additionally, the number of channels in the convolutional layers was reduced, and a channel attention mechanism was introduced to balance feature importance across channels. The Focal Loss function was also employed to address class imbalance, improving performance in complex field environments. As a result of these modifications, the number of parameters was significantly reduced to 8.20 million, compared to the original 35.75 million in YOLOv8-m.
To prevent overfitting and ensure the model's generalization ability, k-fold cross-validation was employed. The dataset was randomly divided into 10 mutually exclusive subsets (folds). Each fold was used as the validation set once, with the remaining subsets used for training. The average of the 10 evaluation results was used as the final performance metric, providing stable performance evaluation for model selection and hyperparameter tuning.
The overall structure of YOLOv8-G adopted a streamlined design, with lighter C2f modules throughout the Neck and Head. These C2f modules improved the model's ability to handle multi-scale feature fusion through a more efficient combination of convolutions and feature splitting, minimizing parameter overhead while maintaining robust detection performance. The Neck utilized up-sampling and down-sampling layers in combination with multi-conv blocks (MCB) to further optimize feature fusion across different scales. With fewer parameters and a focus on lightweight feature processing, YOLOv8-G was optimized for high-speed inference and embedded system deployment while maintaining accuracy in object detection. The architecture diagram of YOLOv8-G is shown in Fig. 4.
Fig. 4 Architecture of the proposed YOLOv8-G

1.2.2 GhostNetV2 and channel attention

GhostNetV2, a lightweight convolutional neural network[26], serves as the backbone of YOLOv8-G in this study. The complete structure of the Ghost module is shown in Fig. 5. Its core advantage lies in reducing computational complexity through "ghost" feature maps, which significantly lower the number of parameters and operations required. Additionally, GhostNetV2 DFC attention mechanism to enhance feature processing efficiency, making it particularly well-suited for the limited computing resources in agricultural machinery. This combination enables faster and more efficient crop feature point extraction, essential for the real-time requirements of embedded systems in agricultural environments[27].
Fig. 5 GhostNetV2 architecture
The channel attention module used in the proposed approach consists of a global pooling layer, fully connected layer, ReLU activation, Hard Sigmoid activation, and a ResNet residual connection. Fig. 6 shows the architecture of the channel attention module. This module focuses on adjusting the importance of each channel, as traditional convolution and pooling processes assign equal importance to all channels by default. Since YOLOv8-G uses an anchor-based structure, and the importance of each anchor can vary, the relationships between channels need to be recalibrated to reflect the varying importance of different features.
Fig. 6 Architecture of the channel attention

1.3 Crop line detection

Before detecting crop lines, it is essential to determine which crops belong to each line, as the number of crop lines in each image may vary. To address this, a clustering algorithm is needed to group the crops into lines. In this study, the Affinity Propagation algorithm was selected for clustering. For comparison, the DBSCAN and OPTICS algorithms were also tested in comparative experiments.
Affinity Propagation, DBSCAN, and OPTICS do not require the number of clusters to be specified in advance. However, DBSCAN requires the pre-definition of a distance threshold and a minimum sample number, both of which directly affect the clustering results. OPTICS, in contrast, reduces the dependency on the distance threshold compared to DBSCAN and can provide more flexible and adaptive clustering results for samples[28-30].
The unmanned agricultural machine moves slowly, and the difference between the adjacent images of the Reflectance (RF) stream is very small, so the Least Squares method is used to detect crop rows in the images.
Equation (1) to Equation (4) present the least squares fitting operation.The fitting expression of the crop line is Equation (5).
  X ¯ = i = 1 n x i n
Y ¯ = i = 1 n y i n
k = i = 1 n x i - X ¯ y i - Y ¯ i = 1 n x i - X ¯ 2
b = Y ¯ - k X ¯
y = k x + b
Where, xi and yi are the x and y coordinates of the center point of the i-th crop, respectively; is the average value of the center point x of all crops; is the average value of the center point y coordinate of all crops; k is the slope of the fitted line and b is the intercept of the fitted line.

1.4 Evaluation index

The proposed approach uses AP, parameter quantity, FPS, and giga floating point operations per second (GFLOPS) as evaluation indices for object detection algorithms. The AP value is the area of the precision recall (PR) curve, which can be calculated using Equation (6). The number of parameters represents the size of the model, and FPS in Equation (7) represents the image processing speed of the model, where FPS is calculated on the Inter(R) Core(TM) i3-8100 processor, RTX1050 graphics card. A GFLOPs equal to 1 billion floating-point operations per second is obtained using Equation (8), where convolution is treated as two operations. Equation (9) calculates the accuracy of crop row detection.
A P = 1 11 r p r
FPS = 1 t
GFLOPS = F L O P S 10 9
Accuracy = c a
where, r is the recall value, r∈(0, 0.1, ⋯, 1.0); p(r) is the accuracy corresponding to r on the PR curve, and t is the time required for the model to process an image. FLOPS denotes floating-point operations per second; c is the number of correct crop rows detected, and a is the number of all crop rows.

2 Results and analysis

2.1 Ablation study of each network design in an incremental manner

To verify the modifications made to YOLOv8, GhostNetV2 was integrated into the architecture, and the number of channels was reduced while Channel Attention and Focal Loss were added to the feature layer in the neck of the convolution process. The algorithm's performance was evaluated using the first period of the corn dataset. As shown in Table 2, replacing the backbone with GhostNetV2 reduced the number of parameters by 5.547 M, decreased GFLOPS by 55.77 G, and increased FPS by 3.76, though the AP value dropped slightly by 0.18%. Further reductions in the number of feature layers in the neck resulted in a decrease of 22.424 M in parameters, a reduction of 32.637 G in GFLOPS, and an increase of 11.64 FPS, but the AP value dropped by 2.07%. To ensure the algorithm became not only smaller and faster but also more accurate, Channel Attention and Focal Loss were introduced, leading to a slight increase of 0.383 M in parameters and 0.123 G in GFLOPS, while improving the AP value to 98.22%. Overall, after incorporating GhostNetV2, deformable convolutional network (DCN), Channel Attention, and Focal Loss, the model's parameters were reduced to 22.91% of YOLOv8 GFLOPS to 10.02% of YOLOv8, and FPS increased by 14.60, with the AP value decreasing by only 0.70%.
Table 2 Incremental ablation study on all network designs
GNV2 DCN FL CA Params/M GFLOPS/G FPS AP/%
× × × × 35.785 98.121 12.53 98.92
× × × 30.238 42.349 16.29 98.74
× × 7.814 9.712 27.93 96.67
× 7.824 9.691 27.21 97.23
8.197 9.835 27.13 98.22

Note:GNV2 represents GhostNetV2; DCN stands for reducing the number of feature layer channels generated by convolution; FL stands for Focal Loss; CA stands for channel attention.

2.2 Comprehensive performance comparison of object detection models

2.2.1 Comparison of feature extraction networks

Currently, mainstream feature extraction networks include GhostNetV2, GhostNet, MobileNetV1, MobileNetV2, MobileNetV3, ResNet50, and VGG. These networks were integrated into the YOLOv8 object detection algorithm for comparative experiments. Data4 was used as the dataset, and four evaluation indices—Params, GFLOPS, FPS, and AP value—were used to assess the performance of each network on the YOLOv8 model.
As shown in Table 3, the number of parameters for YOLOv8-GhostNetV2, YOLOv8-GhostNet, YOLOv8-MobileNetV1, YOLOv8-MobileNetV2, and YOLOv8-MobileNetV3 is relatively close, with a maximum difference of only 2.514 M. In terms of GFLOPS, YOLOv8-GhostNetV2, YOLOv8-GhostNet, YOLOv8-MobileNetV2, and YOLOv8-MobileNetV3 show similar values. For FPS, all these networks achieve over 15 FPS. Considering Params, GFLOPS, and FPS, all the listed networks: YOLOv8-GhostNetV2, YOLOv8-GhostNet, YOLOv8-MobileNetV1, YOLOv8-MobileNetV2, and YOLOv8-MobileNetV3 are viable options. However, in terms of AP value, YOLOv8-GhostNetV2 achieves the highest at 96.50%. Thus, GhostNetV2 was chosen as the backbone network for YOLOv8 after comprehensive evaluation.
Table 3 Comparison of various backbones of YOLOv8
YOLOv8-Backbone Params/M GFLOPS/G FPS AP/%
GhostNetV2 29.418 43.214 16.80 96.50
GhostNet 28.753 41.836 18.52 96.45
MobilenetV1 30.132 52.146 18.28 94.80
MobilenetV2 27.618 44.913 18.36 95.76
MobilenetV3 28.442 44.352 18.05 96.05
Resnet50 52.496 113.446 11.43 96.21
VGG 42.593 287.169 6.29 92.14
The introduction of GhostNetV2 to YOLOv8, along with the reduction of channels in the feature layer and the addition of channel attention and focal loss modules, significantly improved model performance. These modifications resulted in a smaller model size, faster processing speed, and maintained high accuracy, validating the necessity and effectiveness of each improvement. The final results demonstrate that YOLOv8-G excels in target detection, especially in complex field environments, with higher accuracy and efficiency, highlighting its critical role in agricultural production.

2.2.2 Comparison of recognition performance of different models

The YOLOv8 target detection algorithm was employed to extract crop feature points, and comparative experiments were conducted with YOLOv7, YOLOv4, Faster R-CNN, and SSD, focusing on key metrics such as parameter count, GFLOPS, FPS, and AP.
As shown in Table 4, YOLOv8-G demonstrates significant efficiency improvements. With a parameter count of 8.20 M, GFLOPS of 9.36 G, and FPS of 27.13, it outperforms the other algorithms in terms of speed and computational cost. These efficiency improvements are attributed to the use of GhostNetV2 as the backbone network, which was selected after comparing it to more complex alternatives like VGG, ResNet, and MobileNet variants. By reducing the number of parameters and computational requirements, the YOLOv8-G model enhances real-time performance, making it better suited for resource-constrained environments. While Transformer-based models such as DEtection TRansformer (DETR) were considered, they were not utilized due to their higher complexity and incompatibility with embedded systems typically used in agricultural machinery. YOLOv8-G's parameter count is only 34.7% of the second-smallest model, SSD, this notable difference in parameter count underscores the advantage of YOLOv8-G in embedded systems where memory and computational resources are constrained. And its GFLOPS is 16.9% of the second-smallest, YOLOv8. This reduction in computational requirements makes YOLOv8-G highly suitable for embedded agricultural systems, where processing power and memory are often limited. The experimental results of different models are shown in Fig. 7.
Table 4 Comparison results of Params, GFLOPS, and FPS across different models
Models Param/M GFLOPS/G FPS
YOLOv8-G 8.20 9.36 27.13
YOLOv8-m 35.75 55.48 23.71
YOLOv7(standard version) 37.62 59.96 13.53
YOLOv4(standard version) 63.94 106.47 10.93
Faster R-CNN+MobileNet 26.28 940.92 3.00
SSD(standard version) 23.61 68.76 24.07
Fig. 7 Detection results of different models of corn seedling stage object detection and crop row recognition study
To validate YOLOv8-G's generalization ability and prevent overfitting, 10-fold cross-validation was used, providing a more reliable measure of model robustness by ensuring that each data point is included in the validation set. The results, presented in Table 5, show the AP values and standard deviations for different datasets. YOLOv8-G demonstrates strong generalization across all datasets, with stable and high AP values of 98.22%, 98.15%, and 97.32% at 7, 14, and 21 days, respectively, along with small standard deviations. This highlights the model's ability to perform well under varying lighting conditions (morning, midday, and evening) and increasing weed complexity. Even as weed interference intensified from Day 7 to Day 21, YOLOv8-G maintained high AP values, confirming its robustness in handling environmental variability.
Table 5 AP values and standard deviations of 10-fold cross validation on different datasets
Dataset AP value/% Standard deviations
Data1 98.22 6.50
Data2 98.15 6.48
Data3 97.32 6.45
Data4 96.50 6.39
Table 6 provides a detailed performance comparison between YOLOv8-G and other detection algorithms. For Data1, Data2, and Data4, YOLOv8-G's AP values are slightly lower than YOLOv8 by 0.91%, 0.55%, and 0.61%, respectively, and lower than YOLOv7 by 1.51%, 0.35%, and 0.33%, respectively. For Data3, YOLOv8-G ranks fourth, with an AP 1.22% lower than YOLOv8, 1.60% lower than YOLOv7, and 0.28% lower than Faster R-CNN. While YOLOv8-G experiences a slight reduction in average precision compared to YOLOv8 and YOLOv7, its significant gains in efficiency more than compensate for this, as shown in Fig. 5. Moreover YOLOv8 focuses more on balanced overall performance. These results demonstrate that the YOLOv8-based YOLOv8-G is better suited for environments with limited processing power and memory, such as embedded agricultural systems, where efficiency is crucial.
Table 6 Performance comparison of different object detection algorithms in different corn periods detection (AP)
Model Data1/% Data2/% Data3/% Data4/%
YOLOv8-G 98.22 98.15 97.32 96.50
YOLOv8 99.13 98.70 98.54 97.11
YOLOv7 99.71 98.50 98.92 96.83
YOLOv4 97.81 94.18 95.73 92.69
Faster R-CNN 95.12 96.48 97.60 92.80
SSD 91.12 94.41 95.31 89.51
Fig. 8 illustrates YOLOv8-G's ability to distinguish between corn and weeds, a critical factor in agricultural applications. Fig. 8b shows that YOLOv8-G accurately detects corn seedlings while excluding weeds, and Fig. 8c highlights the model's attention on crop features, unaffected by background noise.
Fig. 8 Object detection process of corn seedling stage object detection and crop row recognition study
Compared with other mainstream target detection algorithms, YOLOv8-G offers substantial advantages in computational efficiency and processing speed. Although its accuracy is slightly lower than YOLOv8 (by around 1%, as shown in Table 5), the improvements in speed and resource efficiency make YOLOv8-G more suitable for real-time applications in resource-limited environments like embedded agricultural systems. This trade-off allows YOLOv8-G to meet the demands of unmanned agricultural machinery, ensuring faster and more efficient field operations in challenging environments.

2.3 Actual detection performance in complex environments

The YOLOv8-G algorithm was used to detect the position of corn crops, followed by clustering using DBSCAN, OPTICS, and Affinity Propagation algorithms. Finally, the Least Squares method was applied to detect crop lines. In this study, the Least Squares method was chosen due to its computational efficiency compared to Hough Transform. Tests were conducted on Data 1, Data 2, and Data 3 to evaluate the accuracy of these clustering algorithms, and the most suitable one for corn crop line fitting was selected. Crop line detection is considered successful if all plants in a line fall on the algorithm-generated line; otherwise, it is considered a failure. The detection results are shown in Fig. 9. As summarized in Table 7, DBSCAN and Affinity Propagation achieved accuracies of 97.07% and 96.52%, respectively, while OPTICS had a lower accuracy of 95.14%. A key limitation of DBSCAN is the difficulty in adjusting its two hyperparameters, while Affinity Propagation achieves 96.52% accuracy with default settings. Affinity Propagation's ability to cluster without predefining the number of groups simplifies the process and enhances adaptability to different field conditions. Moreover, Affinity Propagation performed best in terms of parameter count (10.8 M), computational complexity (31.7 GFLOPS), and processing speed (26.1 FPS). Considering these factors—parameter count, computational complexity, processing speed, and accuracy—Affinity Propagation is the most suitable clustering method for crop row detection in unmanned agricultural machinery.
Fig. 9 Results of crop row detection of corn seedling stage object detection and crop row recognition
Table 7 Comparison of the accuracy of three crop row detection algorithms
Algorithm Period Params/M GFLOPS/G FPS Accuracy/%
DBSCAN Data 1 12.5 35.6 24.5 96.51
Data 2 12.5 35.6 24.5 98.86
Data 3 12.5 35.6 24.5 95.85
Data 4 12.5 35.6 24.5 97.07
OPTICS Data 1 14.2 39.8 20.3 94.04
Data 2 14.2 39.8 20.3 94.51
Data 3 14.2 39.8 20.3 95.85
Data 4 14.2 39.8 20.3 95.14
Affinity Propagation Data 1 10.8 31.7 26.1 95.27
Data 2 10.8 31.7 26.1 97.52
Data 3 10.8 31.7 26.1 97.31
Data 4 10.8 31.7 26.1 96.52
In practical applications, accurate crop row detection is essential for the autonomous navigation of unmanned agricultural machinery. The proposed YOLOv8-G method, combined with the Affinity Propagation algorithm, simplifies parameter adjustments while maintaining competitive accuracy. Experimental results show that YOLOv8-G achieves a frame rate of 27.13 FPS and an overall accuracy of 96.50% (as shown in Table 4 and Table 5), allowing it to handle varying lighting and weed conditions in the field effectively. This not only reduces crop damage but also improves operational efficiency. The superior speed and accuracy of YOLOv8-G enable agricultural machinery to complete navigation tasks more quickly and accurately, reducing labor costs and improving automation levels in agricultural production.

3 Discussion

A lightweight YOLOv8-G detection algorithm was specifically tailored for crop row detection in complex agricultural environments. The model demonstrated strong performance across various crop growth stages, particularly under challenging conditions such as inconsistent lighting and high weed interference.
The proposed approach utilizes a deep learning method to extract feature points, which, while slightly slower than traditional image processing methods[31, 32], offers higher accuracy in complex environments. One key advantage of this object detection method is its ability to effectively distinguish crops from weeds and remain robust in complex backgrounds. However, this study has several limitations. Firstly, the YOLOv8-G algorithm, despite its lightweight design, still requires considerable computational power for real-time applications, which may be challenging for some embedded agricultural systems with limited processing capacity. Secondly, the algorithm's adaptability to different crop types is limited; adapting the model to new crops or significantly different environments would require additional retraining and fine-tuning, increasing deployment complexity and cost. Thirdly, while the algorithm performs well under typical lighting variations, its performance in extreme conditions, has not been extensively tested and may require further optimization. Finally, while efforts were made to reduce the model's computational complexity, handling high-resolution images or very dense weed coverage may still impact real-time performance.
Future improvements could focus on further reducing the computational demands of the model without sacrificing accuracy. This could involve exploring more lightweight architectures or hybrid models to enhance real-time performance, especially in resource-constrained environments typical of agricultural machinery. Expanding the dataset to include a wider variety of crops and environmental conditions is another promising direction, as this would improve the model's generalizability and robustness. Additionally, integrating more advanced post-processing techniques could further improve the model's precision and reduce false positives, particularly in environments with dense weed coverage.

4 Conclusions

In this study, YOLOv8-G, a lightweight target detection algorithm was proposed based on YOLOv8, specifically designed for agricultural machinery applications. By using GhostNetV2 as the backbone network and reducing the number of channels in the neck's convolution layers, YOLOv8-G significantly reduces the model's parameters while maintaining high detection accuracy. Focal Loss was employed to address class imbalance, which is particularly useful in agricultural environments with varying lighting conditions and weed interference.
The experimental results demonstrated that YOLOv8-G achieved AP values of 98.22%, 98.15%, and 97.32% at 7, 14, and 21 days after crop emergence, respectively, showing its robustness in detecting corn crops at different seeding stages. Moreover, the algorithm achieved 96.52% accuracy in crop line extraction, further validating its effectiveness under challenging field conditions.
YOLOv8-G's lightweight design and high accuracy make it ideal for embedded agricultural systems, providing reliable crop line detection and navigation support. This facilitates real-time crop row detection and navigation, enhancing the operational efficiency of unmanned agricultural machinery. By addressing challenges such as varying lighting conditions and weed interference, YOLOv8-G contributes to more accurate and reliable autonomous navigation.

COMPETING INTERESTS

All authors declare no competing interests.

1
SHI J Y, BAI Y H, DIAO Z H, et al. Row detection BASED navigation and guidance for agricultural robots and autonomous vehicles in row-crop fields: Methods and applications[J]. Agronomy, 2023, 13(7): ID 1780.

2
QU F H, DING T Y, ZHENG X M, et al. Verification of farmland crop row direction recognition method based on plot morphological characteristics[J]. Remote sensing technology and application, 2024, 39(5): 1213-1222.

3
BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. arXiv: 2004.10934, 2020.

4
YANG Z L, YANG Y, LI C R, et al. Tasseled crop rows detection based on micro-region of interest and logarithmic transformation[J]. Frontiers in plant science, 2022, 13: ID 916474.

5
FU D B, JIANG Q, QI L, et al. Detection of the centerline of rice seedling belts based on region growth sequential clustering-RANSAC[J]. Transactions of the Chinese Society of Agricultural Engineering, 2023, 39(7): 47-57.

6
ZHAI Z Q, XIONG K, WANG L, et al. Crop row detection and tracking based on binocular vision and adaptive Kalman filter[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(8): 143-151.

7
PONNAMBALAM V R, BAKKEN M, MOORE R J D, et al. Autonomous crop row guidance using adaptive multi-ROI in strawberry fields[J]. Sensors, 2020, 20(18): ID 5249.

8
LIU X, HU C H, LI P P. Automatic segmentation of overlapped poplar seedling leaves combining Mask R-CNN and DBSCAN[J]. Computers and electronics in agriculture, 2020, 178: ID 105753.

9
DE SILVA R, CIELNIAK G, WANG G, et al. Deep learning-based crop row detection for infield navigation of agri-robots[J]. Journal of field robotics, 2024, 41(7): 2299-2321.

10
DE SILVA R, CIELNIAK G, GAO J F, et al. Towards agricultural autonomy: Crop row detection under varying field conditions using deep learning[EB/OL]. arXiv:2109.08247, 2021.

11
WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 7464-7475.

12
ZHENG Z Y, LI J W, QIN L F. YOLO-BYTE: An efficient multi-object tracking algorithm for automatic monitoring of dairy cows[J]. Computers and electronics in agriculture, 2023, 209: ID 107857.

13
ZHANG L M, LIU G W, QI Y D, et al. Research progress on key technologies of agricultural machinery unmanned driving system[J]. Journal of intelligent agricultural mechanization, 2022, 3(1): 27-36.

14
CUI X Y, CUI B B, MA Z, et al. Integration of geometric-based path tracking controller and its application in agricultural machinery automatic navigation[J]. Journal of intelligent agricultural mechanization, 2023, 4(3): 24-31.

15
Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]// Computer Vision–ECCV 2016: 14th European Conference. Berlin, Germany: Springer, 2016: 21-37.

16
REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.

17
BOOGAARD F P, RONGEN K S A H, KOOTSTRA G W. Robust node detection and tracking in fruit-vegetable crops using deep learning and multi-view imaging[J]. Biosystems engineering, 2020, 192: 117-132.

18
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. arXiv: 1409.1556, 2014.

19
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2016: 770-778.

20
HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[EB/OL]. arXiv: 1704.04861, 2017.

21
SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 4510-4520.

22
HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2019: 1314-1324.

23
HAN K, WANG Y H, TIAN Q, et al. GhostNet: More features from cheap operations[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2020: 1580-1589.

24
TANG Y, HAN K, GUO J, et al. GhostNetv2: Enhance cheap operation with long-range attention[J]. Advances in Neural Information Processing Systems, 2022, 35: 9969-9982.

25
SHI J Y, BAI Y H, ZHOU J, et al. Multi-crop navigation line extraction based on improved YOLOv8 and threshold-DBSCAN under complex agricultural environments[J]. Agriculture, 2023, 14(1): ID 45.

26
LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 2980-2988.

27
CHI J, GUO S, ZHANG H, et al. L-GhostNet: Extract better quality features[J]. IEEE Access, 2023, 11: 2361-2374.

28
LIU F C, YANG Y, ZENG Y M, et al. Bending diagnosis of rice seedling lines and guidance line extraction of automatic weeding equipment in paddy field[J]. Mechanical systems and signal processing, 2020, 142: ID 106791.

29
GARCÍA-SANTILLÁN I D, PAJARES G. On-line crop/weed discrimination through the Mahalanobis distance from images in maize fields[J]. Biosystems engineering, 2018, 166: 28-43.

30
DIAO Z H, GUO P L, ZHANG B H, et al. Maize crop row recognition algorithm based on improved UNet network[J]. Computers and electronics in agriculture, 2023, 210: ID 107940.

31
NAN Y L, ZHANG H C, ZENG Y, et al. Intelligent detection of Multi-Class pitaya fruits in target picking row based on WGB-YOLO network[J]. Computers and electronics in agriculture, 2023, 208: ID 107780.

32
CHEN J Q, QIANG H, WU J H, et al. Navigation path extraction for greenhouse cucumber-picking robots using the prediction-point Hough transform[J]. Computers and electronics in agriculture, 2021, 180: ID 105911.

文章导航

/