欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于YOLOv10-MHSA的三北工程内蒙古地区植树位点精准检测

谢纪元1,2(), 张东彦1,2(), 牛圳1,2, 程涛1,2, 苑峰3, 刘亚玲3   

  1. 1. 西北农林科技大学 机械与电子工程学院,陕西杨凌 712100,中国
    2. 陕西省农业信息感知与智能服务重点实验室,陕西杨凌 712100,中国
    3. 国家草业技术创新中心(筹),内蒙古 呼和浩特 010021,中国
  • 收稿日期:2024-10-16 出版日期:2025-01-24
  • 基金项目:
    内蒙古自治区科技计划项目(2023JBGS000804); 呼和浩特市科技创新领域人才项目(2023RC-高层次-7); 呼和浩特市基础研究与应用基础研究项目(2024-规-基-34)
  • 作者简介:

    谢纪元,研究方向为无人机遥感。E-mail:

  • 通信作者:
    张东彦,博士,教授,研究方向为智慧林业与草业。E-mail:

Accurate Detection of Tree Planting Locations in Inner Mongolia for The Three North Project Based on YOLOv10-MHSA

XIE Jiyuan1,2(), ZHANG Dongyan1,2(), NIU Zhen1,2, CHENG Tao1,2, YUAN Feng3, LIU Yaling3   

  1. 1. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China
    2. Shaanxi Key Laboratory of Agriculture Information Perception and Intelligent Service, Yangling 712100, China
    3. National Grass Technology Innovation Center (Preparation), Hohhot 010021, China
  • Received:2024-10-16 Online:2025-01-24
  • Foundation items:Science and Technology Program of Inner Mongolia Autonomous Region(2023JBGS000804); Hohhot Science and Technology Innovation Field Talent Program; Hohhot Basic Research and Applied Basic Research Program
  • About author:

    XIE Jiyuan, E-mail:

  • Corresponding author:
    ZHANG Dongyan, E-mail:

摘要:

【目的/意义】 为解决无人机平台下三北工程内蒙古地区植树位点(树坑)受复杂背景(灌木、杂草群、裸露沙土、起伏地形等)影响,容易出现树坑漏检错检问题,构建了一种针对该场景下小目标检测模型——YOLOv10-MHSA(You Only Look Once version 10-Multi-head Self-attention)。 【方法】 以YOLOv10为基准模型,通过添加新的小目标检测层,增强检测网络对小目标语义信息的捕捉,提高其对小目标特征描述的准确性;引入可变卷积核AKConv(Adaptive Kernel Convolution),使模型更精确地聚焦输入图中的特征;构建融合特征的多头自注意力机制MHSA以实现考虑复杂环境因素的有效特征获取;引入Focal-EIOU Loss (Focal Efficient Intersection over Union Loss)替代原有CIOU Loss(Complete Intersection over Union Loss)作为边界框的回归损失,提高算法模型的收敛能力以及对边界框预测的准确性;最后,选择影响精准识别效果最大的两个因素:植树位点的不同密集分布和不同光线环境,验证了提出的模型的鲁棒性。 【结果和讨论】 提出的模型YOLOv10-MHSA在实验数据集上的平均识别精度和检测准确率分别达96.1%和92.1%,相比原模型分别提高4.1%和5.1%,传输速率为109帧/s,可满足无人机对三北工程内蒙古地区植树位点(树坑)进行实时识别的精度和速度要求。 【结论】 提出的YOLOv10-MHSA检测模型在保持高性能的同时能够有效提升复杂背景下小目标检测的精度,这为无人机平台下内蒙古地区三北防护林带种植管理中植树位点的遥感检测提供了新的方法。

关键词: 植树位点, 复杂背景, 无人机, 小目标检测, YOLOv10

Abstract:

[Objective] The purpose of this study is to solve the problem of accuracy and efficiency in the detection of tree planting sites (tree pits) in Inner Mongolia of China's 'Three North Project'. The traditional manual field investigation method of the tree planting sites is not only inefficient but also error-prone, and the low-altitude unmanned aerial vehicle (UAV) has become the best choice to solve these problems. To this end, the research team proposed an accurate recognition and detection model of tree planting sites based on YOLOv10-MHSA. [Methods] In this study, a long-endurance multi-purpose vertical take-off and landing fixed-wing UAV was used to collect images of tree planting sites. The UAV was equipped with a 26 million pixel camera with high spatial resolution, which was suitable for high-precision mapping in the field. The aerial photography was carried out from 11:00 to 12:00 on August 1, 2024. The weather was sunny, the wind force was 3, the flight height was set to 150 m (ground resolution was about 2.56 cm), the course overlap rate was 75 %, the side overlap rate was 65 %, and the flight speed was 20 m/s. After the image acquisition was completed, the aerial images were stitched using Metashape software (v2.1.0) to generate a digital orthophoto map (DOM) covering about 2 000 mu (880 m×1 470 m) of tree planting sites, and it was cut through a 640-pixel sliding window into 3 102 high-definition RGB images of 640×640 size for subsequent detection and analysis. In order to prevent overfitting in the process of network training, the research team expanded and divided the original data set. By increasing the amount of model training data, introducing different attention mechanisms and optimizing loss functions, the quality and efficiency of model training are improved. A more effective EIOU loss function was introduced, which was divided into three parts: IOU loss, distance loss and azimuth loss, which directly minimized the width and height difference between the target frame and Anchor, resulting in faster convergence speed and better positioning results. In addition, the Focal-EIOU loss function was introduced to optimize the sample imbalance problem in the bounding box regression task, which further improves the convergence speed and positioning accuracy of the model. [Results and Discussions] After the introduction of the multi-head self-attention mechanism (MHSA), the model was improved by 1.4% and 1.7% on the two evaluation criteria of AP@0.5 and AP@0.5:0.95, respectively, and the accuracy and recall rate were also improved. It showed that MHSA could better help the model to extract the feature information of the target and improve the detection accuracy in complex background. Although the processing speed of the model decreases slightly after adding the attention mechanism, the overall decrease was not large, and it could still meet the requirements of real-time detection. On the optimization of the loss function, the experiment compared the four loss functions of CIOU, SIOU, EIOU and Focal-EIOU. The results showed that the Focal-EIOU loss function was improved, and the precision and recall rates were also significantly improved. This showed that the Focal-EIOU loss function could accelerate the convergence speed of the model and improve the positioning accuracy when dealing with the sample imbalance problem in small target detection. Although the processing speed of the model was slightly reduced, it still meet the requirements of real-time detection. Finally, an improved model, YOLOv10-MHSA, was proposed, which introduces MHSA attention mechanism, small target detection layer and Focal-EIOU loss function. The results of ablation experiments showed that AP@0.5 and AP@0.5:0.95 were increased by 2.1% and 0.9%, respectively, after adding only small target detection layer on the basis of YOLOv10n, and the accuracy and recall rate were also significantly improved. When the MHSA and Focal-EIOU loss functions were further added, the model detection effect was significantly improved. Compared with the baseline model YOLOv10n, the AP@0.5, AP@0.5:0.95, P-value and R-value were improved by 6.6%, 9.8%, 4.4% and 5.1%, respectively. Although the FPS was reduced to 109, the detection performance of the improved model was significantly better than that of the original model in various complex scenes, especially for small target detection in densely distributed and occluded scenes. [Conclusions] In summary, this study effectively improved the YOLOv10n model by introducing MHSA and the optimized loss function (Focal-EIOU), which significantly improved the accuracy and efficiency of tree planting site detection in the 'Three North Project' in Inner Mongolia. The experimental results show that MHSA can enhance the ability of the model to extract local and global information of the target in complex background, and effectively reduce the phenomenon of missed detection and false detection. The Focal-EIOU loss function accelerates the convergence speed of the model and improves the positioning accuracy by optimizing the sample imbalance problem in the bounding box regression task. Although the model processing speed has declined, it still meets the real-time detection requirements and provides strong technical support for the scientific afforestation of the 'Three North Project'.

Key words: sparse grassland, complex background, unmanned aerial vehicle, small target detection, YOLOv10

中图分类号: