Accurate Detection of Tree Planting Locations in Inner Mongolia for The Three North Project Based on YOLOv10-MHSA

doi:10.12133/j.smartag.SA202410010

Abstract

Abstract:

[Objective] The traditional manual field investigation method of the tree planting locations is not only inefficient but also error-prone, and the low-altitude unmanned aerial vehicle (UAV) has become the best choice to solve these problems. To solve the problem of accuracy and efficiency in the detection of tree planting locations (tree pits) in Inner Mongolia of China's Three North Project, an accurate recognition and detection model of tree planting locations based on YOLOv10-MHSA was proposed. [Methods] A long-endurance, multi-purpose vertical take-off and landing (VTOL) fixed-wing UAV was used to collect images of tree planting locations. Equipped with a 26-megapixel camera with high spatial resolution, the UAV was well-suited for high-precision field mapping. Aerial photography was conducted between 11:00 and 12:00 on August 1, 2024. Flight parameters were set as follows: Altitude of 150 m (yielding a ground resolution of approximately 2.56 cm), course overlap rate of 75%, side overlap rate of 65%, and flight speed of 20 m/s. To prevent overfitting during network training, the original data set was enhanced. To improve the quality and efficiency of model training, different attention mechanisms and optimizing loss functions were introduced. Specifically, a more effective EIOU loss function was introduced, comprising three components: IOU loss, distance loss, and azimuth loss. This function directly minimizes the width and height discrepancies between the target frame and anchor, leading to faster convergence and more accurate positioning. Additionally, the Focal-EIOU loss function was adopted to address sample imbalance in bounding box regression tasks, further improving the model's convergence speed and positioning precision. [Results and Discussions] After the introduction of the multi-head self-attention mechanism (MHSA), the model achieved improvements of 1.4% and 1.7% in the evaluation metrics AP@0.5 and AP@0.5:0.95, respectively, and the accuracy and recall rate were also improved. This indicates that MHSA effectively aids the model in extracting the feature information of the target and improving the detection accuracy in complex background. Although the processing speed of the model decreases slightly after adding the attention mechanism, it could still meet the requirements of real-time detection. The experiment compared four loss functions: CIOU, SIOU, EIOU and Focal-EIOU. The results showed that the Focal-EIOU loss function yielded significant increases in precision and recall. This demonstrated that the Focal-EIOU loss function could accelerate the convergence speed of the model and improve the positioning accuracy when dealing with the sample imbalance problem in small target detection. Finally, an improved model, YOLOv10-MHSA, was proposed, incorporating MHSA attention mechanism, small target detection layer and Focal-EIOU loss function. The results of ablation experiments showed that AP@0.5 and AP@0.5:0.95 were increased by 2.2% and 0.9%, respectively, after adding only small target detection layer on the basis of YOLOv10n, and the accuracy and recall rate were also significantly improved. When the MHSA and Focal-EIOU loss functions were further added, the model detection effect was significantly improved. Compared with the baseline model YOLOv10n, the AP@0.5, AP@0.5:0.95, P-value and R-value were improved by 6.6%, 10.0%, 4.1% and 5.1%, respectively. Although the FPS was reduced, the detection performance of the improved model was significantly better than that of the original model in various complex scenes, especially for small target detection in densely distributed and occluded scenes. [Conclusions] By introducing MHSA and the optimized loss function (Focal-EIOU) into YOLOv10n model, the research significantly improved the accuracy and efficiency of tree planting location detection in the Three North Project in Inner Mongolia. The experimental results show that MHSA can enhance the ability of the model to extract local and global information of the target in complex background, and effectively reduce the phenomenon of missed detection and false detection. The Focal-EIOU loss function accelerates the convergence speed of the model and improves the positioning accuracy by optimizing the sample imbalance problem in the bounding box regression task. Although the model processing speed has decreased, the method proposed still meets the real-time detection requirements, provides strong technical support for the scientific afforestation of the Three North Project.

Key words: tree planting locations, complex background, unmanned aerial vehicle, small target detection, YOLOv10

CLC Number:

XIE Jiyuan, ZHANG Dongyan, NIU Zhen, CHENG Tao, YUAN Feng, LIU Yaling. Accurate Detection of Tree Planting Locations in Inner Mongolia for The Three North Project Based on YOLOv10-MHSA[J]. Smart Agriculture, 2025, 7(3): 108-119.

Figures/Tables 15

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Table 1

Fig. 6

Table 2

Table 3

Fig. 7

Table 4

Table 5

Fig. 8

Fig. 9

Fig. 10

References 27

[1]	ARASUMANI M, BUNYAN M, ROBIN V V. Opportunities and challenges in using remote sensing for invasive tree species management, and in the identification of restoration sites in tropical montane grasslands[J]. Journal of environmental management, 2021, 280: ID 111759.
[2]	AL-ALI Z M, ABDULLAH M M, ASADALLA N B, et al. A comparative study of remote sensing classification methods for monitoring and assessing desert vegetation using a UAV-based multispectral sensor[J]. Environmental monitoring and assessment, 2020, 192(6): ID 389.
[3]	LI D J, XU D Y, WANG Z Y, et al. Ecological compensation for desertification control: A review[J]. Journal of geographical sciences, 2018, 28(3): 367-384.
[4]	HAO Z B, POST C J, MIKHAILOVA E A, et al. How does sample labeling and distribution affect the accuracy and efficiency of a deep learning model for individual tree-crown detection and delineation[J]. Remote sensing, 2022, 14(7): ID 1561.
[5]	KUMAR P, DEBELE S E, SAHANI J, et al. An overview of monitoring methods for assessing the performance of nature-based solutions against natural hazards[J]. Earth-science reviews, 2021, 217: ID 103603.
[6]	KOURGIALAS N N, KOUBOURIS G C, DOKOU Z. Optimal irrigation planning for addressing current or future water scarcity in Mediterranean tree crops[J]. Science of the total environment, 2019, 654: 616-632.
[7]	李妹燕, 李芬, 徐景秀. 基于机器学习方法的高光谱遥感图像目标检测研究[J]. 激光杂志, 2024, 45(10): 108-113.
	LI M Y, LI F, XU J X. Research on target detection in hyperspectral remote sensing images based on machine learning methods[J]. Laser journal, 2024, 45(10): 108-113.
[8]	林晓林, 孙俊. 基于机器学习的小目标检测与追踪的算法研究[J]. 计算机应用研究, 2018, 35(11): 3450-3453, 3457.
	LIN X L, SUN J. Research on small object detection and tracking algorithm based on machine learning[J]. Application research of computers, 2018, 35(11): 3450-3453, 3457.
[9]	叶昕怡, 高思莉, 李范鸣. 基于自适应对比度增强的红外小目标检测网络(英文)[J]. 红外与毫米波学报, 2023, 42(5): 701-710.
	YE X Y, GAO S L, LI F M. ACE-STDN: An infrared small target detection network with adaptive contrast enhancement[J]. Journal of infrared and millimeter waves, 2023, 42(5): 701-710.
[10]	彭小丹, 陈锋军, 朱学岩, 等. 基于无人机图像和改进LSC-CNN模型的密集苗木检测和计数方法[J]. 智慧农业(中英文), 2024, 6(5): 88-97.
	PENG X D, CHEN F J, ZHU X Y, et al. Dense nursery stock detecting and counting based on UAV aerial images and improved LSC-CNN[J]. Smart agriculture, 2024, 6(5): 88-97.
[11]	林两魁, 王少游, 唐忠兴. 基于深度卷积神经网络的红外过采样扫描图像点目标检测方法[J]. 红外与毫米波学报, 2018, 37(2): 219-226.
	LIN L K, WANG S Y, TANG Z X. Point target detection in infrared over-sampling scanning images using deep convolutional neural networks[J]. Journal of infrared and millimeter waves, 2018, 37(2): 219-226.
[12]	HAO Y, ZHANG C X, LI X Y. Research on defect detection method of bearing dust cover based on machine vision and multi-feature fusion algorithm[J]. Measurement science and technology, 2023, 34(10): ID 105016.
[13]	HUANG G B, BAI Z, KASUN L L C, et al. Local receptive fields based extreme learning machine[J]. IEEE computational intelligence magazine, 2015, 10(2): 18-29.
[14]	WU Y H, LIU Y, ZHANG L, et al. EDN: Salient object detection via extremely-downsampled network[J]. IEEE transactions on image processing, 2022, 31: 3125-3136.
[15]	LI S L, ZHANG S J, XUE J X, et al. A fast neural network based on attention mechanisms for detecting field flat jujube[J]. Agriculture, 2022, 12(5): ID 717.
[16]	ZHANG X, SONG Y, SONG T, et al. AKConv: Convolutional kernel with arbitrary sampled shapes and arbitrary number of parameters [EB/OL]. arXiv: 231111587, 2023.
[17]	NIU Z Y, ZHONG G Q, YU H. A review on the attention mechanism of deep learning[J]. Neurocomputing, 2021, 452: 48-62.
[18]	WU Z W, WANG X F, JIA M, et al. Dense object detection methods in RAW UAV imagery based on YOLOv8[J]. Scientific reports, 2024, 14: ID 18019.
[19]	DOMINIAK K N, KRISTENSEN A R. Prioritizing alarms from sensor-based detection models in livestock production: A review on model performance and alarm reducing methods[J]. Computers and electronics in agriculture, 2017, 133: 46-67.
[20]	LIU T, LU Y H, ZHANG Y, et al. A bone segmentation method based on multi-scale features fuse U2Net and improved dice loss in CT image process[J]. Biomedical signal processing and control, 2022, 77: ID 103813.
[21]	TAN H C, LIU X P, YIN B C, et al. MHSA-net: Multihead self-attention network for occluded person re-identification[J]. IEEE transactions on neural networks and learning systems, 2023, 34(11): 8210-8224.
[22]	JIN Y Q, MA J H, LIAN Y, et al. Cervical cytology screening using the fused deep learning architecture with attention mechanisms[J]. Applied soft computing, 2024, 166: ID 112202.
[23]	DU S J, ZHANG B F, ZHANG P, et al. An improved bounding box regression loss function based on CIOU loss for multi-scale object detection[C]// 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML). Piscataway, New Jersey, USA: IEEE, 2021.
[24]	HUANG P P, TIAN S H, SU Y, et al. IA-CIOU: An improved IOU bounding box loss function for SAR ship target detection methods[J]. IEEE journal of selected topics in applied earth observations and remote sensing, 2024, 17: 10569-10582.
[25]	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.
[26]	SHEN Y Y, ZHANG F Z, LIU D, et al. Manhattan-distance IOU loss for fast and accurate bounding box regression and object detection[J]. Neurocomputing, 2022, 500: 99-114.
[27]	ZHAO Y, HRYNIEWICKI M K. XGBOD: Improving supervised outlier detection with unsupervised representation learning[C]// 2018 International Joint Conference on Neural Networks (IJCNN). Piscataway, New Jersey, USA: IEEE, 2018.

模型名称	网络深度	网络宽度	AP@0.5	AP@0.5：0.95	P/%	R/%	参数量/M
YOLOv10n	0.33	0.25	0.921	0.761	0.923	0.876	5.3
YOLOv10s	0.33	0.50	0.933	0.796	0.938	0.881	11.2
YOLOv10m	0.67	0.75	0.939	0.846	0.947	0.886	31.3
YOLOv10b	1.00	1.00	0.951	0.854	0.956	0.894	56.8

模型名称	AP@0.5	AP@0.5：0.95	P	R	FPS/（f/s）
YOLOv10n	0.921	0.761	0.923	0.876	134
+SA	0.925	0.752	0.931	0.887	132
+EMSA	0.931	0.763	0.942	0.854	126
+MHSA	0.934	0.774	0.938	0.886	130

损失函数	AP@0.5	AP@0.5：0.95	P	R	FPS/（f/s）
CIOU	0.921	0.761	0.923	0.876	134
SIOU	0.921	0.757	0.932	0.892	132
EIOU	0.928	0.762	0.941	0.871	124
Focal-EIOU	0.931	0.776	0.938	0.885	128

模型名称	AP@0.5	AP@0.5：0.95	P	R	FPS/（f/s）
YOLOv10n	0.921	0.761	0.923	0.876	134
+小目标检测层	0.941	0.768	0.932	0.882	119
+ AKConv	0.946	0.784	0.937	0.897	124
+MHSA	0.934	0.774	0.938	0.886	130
+ Focal-EIOU Loss	0.931	0.776	0.938	0.885	128
YOLOv10-MHSA	0.982	0.837	0.961	0.921	109

模型名称	评价指标
模型名称	AP@0.5	AP@0.5：0.95	P	R	FPS/（f/s）
YOLOv5s	0.897	0.698	0.841	0.812	138
YOLOv8n	0.915	0.734	0.867	0.795	121
YOLOv10n	0.921	0.761	0.923	0.876	134
SSD	0.784	0.624	0.792	0.743	67
Faster-R-CNN	0.837	0.703	0.823	0.802	58
YOLOv10-MHSA	0.982	0.837	0.961	0.921	109