Soybean Yield Estimation Method Based on Multi-Source Data Fusion

YIN Qiwei; HE Yan; WANG Zongli; RAO Yuan

doi:10.12133/j.smartag.SA202512004

Smart Agriculture >

2026 1 - 17

DOI: https://doi.org/10.12133/j.smartag.SA202512004

Soybean Yield Estimation Method Based on Multi-Source Data Fusion

YIN Qiwei ¹^,² ,
HE Yan ^,¹^,² ,
WANG Zongli ¹ ,
RAO Yuan ³

Expand

^1. College of Biological Engineering, Xianning Vocational TechnicalCollege, Xianning 437000, China
^2. Engineering College, Heilongjiang Bayi Agricultural University, Daqing 163000, China
^3. School of Information and Computer Science, Anhui Agricultural University, Anhui 230000, China

HE Yan, E-mail: 1175850651@qq.com

尹祈玮，硕士，研究方向为遥感图像处理。E-mail：2694539535@qq.com

YIN Qiwei, E-mail: 2694539535@qq.com

Received date: 2025-12-05

Online published: 2026-05-22

Supported by

National Modern Agricultural Industry Technology System Project(CARS-04-PS30)

国家现代农业产业技术体系项目(CARS-04-PS30)

2026 Hubei Provincial Natural Science Foundation(JCZRLH202601007)

2026年湖北省自然科学基金(JCZRLH202601007)

Copyright

Fold

Abstract

[Objective] The aim is to develop a high-precision, large-scale soybean yield estimation framework by integrating multi-source remote sensing data, addressing the critical need for accurate and timely crop production monitoring, and to support precision agriculture and food security decision-making at both field and regional scales. [Methods] Based on multi-source data fusion theory, a soybean yield precision prediction method integrating Unmanned Aerial Vehicle (UAV) and Sentinel-2 multi-source data was proposed. Jianshan Farm was chosen as the study area, and a total of 127 Sentinel-2 images and 47 UAV images covering the whole soybean growth period (May-September) in 2022-2023 were collected. According to the image acquisition time, the Savitzky-Golay filter was adopted to construct half-monthly and monthly synthetic images for UAV and Sentinel-2 data, which guaranteed temporal consistency of multi-source data and reduced the influence of cloud contamination. Meanwhile, three multi-scale spatial feature fusion strategies for UAV and satellite imagery were designed, and three improved Convolutional Neural Network models were constructed by coupling the above fusion methods. These models were subsequently trained and evaluated on the constructed time-series multi-feature datasets. [Results and Discussions] The Long Short-Term Memory-based fusion method obtained the optimal accuracy with R² of 0.93 among three spatial feature fusion schemes. Furthermore, the Spatial Attention-Gated Recurrent Unit-Convolutional Neural Network model exhibited the best soybean yield estimation capability, with its R² reaching 0.94. [Conclusions] The UAV-Sentinel-2 multi-source data fusion framework can effectively improve the accuracy of regional soybean yield estimation, and the optimized fusion algorithm and improved model in this study can provide a reliable technical reference for large-area and high-precision crop yield monitoring.

Key words： UAV; Sentinel-2; multi-source data fusion; yield; soybean

Cite this article

YIN Qiwei , HE Yan , WANG Zongli , RAO Yuan . Soybean Yield Estimation Method Based on Multi-Source Data Fusion[J]. Smart Agriculture, 2026 : 1 -17 . DOI: 10.12133/j.smartag.SA202512004

0 Introduction

Soybean yield prediction using remote sensing relies on the spectral response of crops at canopy level^[1], so model quality directly determines how reliable the final numbers are^[2]. Yet high accuracy and broad coverage are hard to get from a single platform. Unmanned aerial vehicles (UAV) fly low, capture fine spatial detail, but can only cover a few plots at a time; scaling them up to a whole farm or county is practically impossible. Satellites, on the other hand, do cover large areas in one pass, yet cloud, haze and fog routinely spoil the images, and fixed revisit cycles mean a clear shot may not be obtained exactly when needed^[3].

With the push of remote sensing technology, researchers now have access to images of many spatial, spectral and temporal flavors for the same area—which called multi-source remote sensing datasets^[4]. LIAO et al.^[5] fused Landsat-8 and MODIS data in eastern Ontario for soybean yield estimation, but their work stayed at field scale. WANG et al^.[6] used the Spatial and Temporal Non-Local Filter-based Fusion Model to blend Sentinel-2A and MODIS observations for summer maize simulation, achieving an R ² of 0.84. YANG et al.^[7] fused RGB, multispectral and thermal UAV imagery for wheat yield prediction; their best combination (RGB-MS-Texture-TIR) reached R ²=0.660, RMSE (Root Mean Square Error)=0.754. These results showed promise, yet each had its blind spot: satellite-only fusion still carried geometric, spectral and spatial mismatches that blunt prediction accuracy; UAV-only setups delivered detail but cannot step up to regional monitoring. YANG^[8] built a UAV-satellite cross-scale system for potato growth parameter retrieval, using UAV-derived pixel-level data to calibrate satellite models. That idea is clever, yet it was tailored to potatoes, and crop-specific phenology and spectral traits make it hard to port straight to soybean. Overall, UAV-satellite fusion work is still sparse, and resolution gaps have largely blocked effective integration.

VAN HOUT ^[9] showed that crops change continuously through the season, so a single snapshot misses the dynamics. ZHOU et al.^[10] and HAN et al.^[11]used trapezoidal integration of multi-temporal vegetation indices to show that UAV-based multi-date indices boost rice and summer maize yield estimates well beyond single-stage data LU et al.^[12] conducted correlation analyses between diverse vegetation indices across multiple temporal acquisitions and winter wheat yield, revealing that correlation magnitude progressively strengthened concomitant with phenological advancement, attaining a zenith of 0.7 during the booting stage (March 26). ZHAO et al.^[13] leveraged multi-temporal canopy multispectral UAV imagery from 266 wheat cultivars to construct yield prediction models incorporating both single- and multi-stage phenological information, with optimal performance achieved by a Random Forest model based on five growth stages, yielding a predictive R ² of 0.834. Accordingly, utilization of multi-temporal remote sensing data constitutes a prerequisite for improving crop yield estimation precision. Nonetheless, systematic and in-depth investigation into the optimal selection of temporal granularity and configuration of temporal combination strategies remains scarce^[14].

However, despite the above achievements, several critical issues still restrict the accuracy and applicability of soybean yield estimation. First, the contradiction between high precision and large-scale monitoring remains unsolved: UAV data provides fine details but lacks regional coverage, while satellite data achieves wide coverage but suffers from cloud contamination and coarse resolution. Second, most existing studies ignore the optimal selection of temporal granularity and lack quantitative comparisons between different time windows, leading to incomplete capture of crop growth dynamics. Third, current multi-source fusion methods mostly stay at the level of data splicing, failing to realize deep feature fusion and temporal dependency modeling, resulting in limited model performance. To address these gaps, a high-precision soybean yield estimation framework is constructed in this study based on UAV-Sentinel-2 multi-source data fusion, with the aim of achieving effective spatiotemporal matching, deep feature integration and accurate large-scale yield prediction.

A multi-source data fusion algorithm was adopted to integrate UAV and satellite multi-temporal remote sensing observations in this study, on the basis of which an enhanced convolutional neural network (CNN) framework for soybean yield estimation was developed. By systematically evaluating the effects of diverse fusion methodologies and critical temporal phases on model performance, the optimal fusion strategy and most informative phenological windows were identified, thereby facilitating accurate soybean yield forecasting for Jianshan Farm.

1 Materials and methods

1.1　Study area

Field experiments were carried out at Jianshan Farm, Heihe, Heilongjiang-a core soybean-producing zone referred to as China's "Soybean Capital". The farm spans 125°19′53″E-125°47′15″E and 48°46′55″N-49°1′18″N, with annual precipitation of roughly 500-600 mm falling mostly in July and August. Its cold-temperate continental monsoon climate and contiguous, ridge-based cultivation pattern make it representative of large-scale mechanised soybean production in Northeast China.

1.2　Data

1.2.1　Field data acquisition

Field data surveys were conducted in the experimental area of Jianshan Farm to ensure the accuracy of data annotation during remote sensing image processing. The sample was conducted from September 15th to September 30th, 2022, and from September 5th to September 13th, 2023. The main focus was on collecting yield information from four plots: North 13 Plot,North 14 Plot,North 15 Plot,and North 9 Plot (North 13 Plot,North 14 Plot,North 15 Plot,andNorth 9 Plot), with corresponding areas of 50.4, 55.4, 53.4, and 51.2 hm², respectively, totaling 210.4 hm². Manual sampling was conducted using a grid method, with five sampling points per grid (Fig. 1), and the latitude and longitude information along with the yield of each point were recorded. Specifically, 194 sampling points were collected in North 13 Plot, 195 in North 14 Plot, 180 in North 15 Plot, and 201 in North 9 Plot, totaling 770 sampling points. For each sampling point, an area of 1 m² was selected to count the total number of soybean plants and the number of pods on each plant. The total number of soybeans within the area was calculated, and the hundred-grain weight of soybeans was obtained using a balance. Finally, soybean yield was calculated using the Equation(1).

T = S × B × Q × 0.85 × 667 10 000

（1）

Band number	Band	Central wavelength/μm	Bandwidth/nm	Spatial resolution/m
1	Coastal aerosol	0.443	20	60
2	Blue	0.490	65	10
3	Green	0.560	35	10
4	Red	0.665	30	10
5	Vegetation red edge	0.705	15	20
6	Vegetation red edge	0.740	15	20
7	Vegetation red edge	0.783	20	20
8	Near Infrared	0.842	115	10
8A	Vegetation red edge	0.865	20	20
9	Water vapour	0.945	20	60
10	Short-Wave Infrared -Cirrus	1.375	20	60
11	SWIR	1.610	90	20

Texture feature parameters	Abbreviation	Central band/μm
Mean texture	Mean	Describe the brightness and darkness of an image
Homogeneity texture	Hom	Evaluate the local grayscale uniformity of the image
Dissimilarity texture	Dis	Characterize the differences in texture features between pixels
Entropy texture	Ent	Indicate the size of the amount of information
Second Moment texture	Sm	Describe the uniformity and thickness of texture features
Correlation texture	Corr	Predict the main trend of texture

Model	MAE/（kg/hm²）	RMSE/（kg/hm²）	MAPE	R ²
CNN-UAV-15	22.294 1	30.850 5	0.106 6	0.39
CNN-Sentinel-15	26.461 6	32.930 7	0.111 6	0.32
CNN-US-15	21.827 5	28.974 6	0.095 7	0.44
CNN-UAV-30	24.895 8	31.450 9	0.105 8	0.37
CNN-Sentinel-30	27.005 5	34.359 5	0.120 5	0.21
CNN-US-30	22.177 5	29.737 7	0.096 8	0.40

Model	MAE/（kg/hm²）	RMSE/（kg/hm²）	MAPE	R ²
SVM-US-CNN	42.304 6	55.776 7	0.191 5	0.38
SVM-TCA-CNN	35.184 3	45.623 3	0.168 3	0.66
SVM-GRU-CNN	31.577 1	38.886 2	0.153 2	0.73
SVM-LSTM-CNN	26.336 9	32.266 6	0.132 6	0.80
SA-US-CNN	10.730 5	14.740 8	0.050 4	0.85
SA-TCA-CNN	9.488 0	13.941 9	0.048 9	0.87
SA-GRU-CNN	8.403 8	12.283 8	0.037 8	0.94
SA-LSTM-CNN	10.219 8	14.569 3	0.0454	0.91
SE-US-CNN	14.983 6	12.230 6	0.104 2	0.70
SE-TCA-CNN	11.693 5	13.983 2	0.0963	0.71
SE-GRU-CNN	11.707 1	12.191 7	0.053 0	0.90
SE-LSTM-CNN	8.520 9	12.609 2	0.036 8	0.92

Model	MAE/（kg/hm²）	RMSE/（kg/hm²）	MAPE	R ²
SVM-US-CNN	48.918 5	58.501 2	0.211 0	0.25
SVM-TCA-CNN	45.172 2	56.643 1	0.189 9	0.58
SVM-GRU-CNN	41.448 2	55.486 7	0.173 6	0.69
SVM-LSTM-CNN	28.893 5	36.069 6	0.152 6	0.79
SA-US-CNN	10.524 5	14.188 6	0.048 1	0.83
SA-TCA-CNN	7.777 5	10.654 0	0.035 5	0.86
SA-GRU-CNN	12.201 6	17.046 4	0.052 0	0.91
SA-LSTM-CNN	12.183 3	17.698 3	0.052 4	0.90
SE-US-CNN	14.974 6	32.250 4	0.108 2	0.66
SE-TCA-CNN	11.568 8	25.953 3	0.083 2	0.71
SE-GRU-CNN	17.902 1	25.219 7	0.076 3	0.78
SE-LSTM-CNN	14.953 4	20.938 2	0.063 8	0.85

Model	R ²	MAE/（kg/hm²）	RMSE/（kg/hm²）	MAPE
UAV-CNN	0.37	24.90	31.45	0.106
US-CNN fusion	0.44	21.83	28.97	0.096
SG-CNN	0.71	15.17	18.63	0.064
GRU-CNN	0.90	12.53	17.25	0.056
SA-GRU-CNN	0.94	8.40	12.28	0.038

Model	MAE//（kg/hm²）	RMSE/（kg/hm²）	MAPE	R ²
SVM-US-CNN	43.456 2	55.967 5	0.196 6	0.37
SVM-TCA-CNN	34.683 2	44.326 8	0.153 4	0.67
SVM-GRU-CNN	31.867 4	39.326 5	0.157 6	0.72
SVM-LSTM-CNN	26.635 9	31.324 4	0.113 5	0.81
SA-US-CNN	15.786 5	14.964 2	0.056 8	0.84
SA-TCA-CNN	9.569 0	13.082 6	0.040 9	0.86
SA-GRU-CNN	8.253 8	9.603 9	0.036 5	0.93
SA-LSTM-CNN	16.326 5	15.834 5	0.073 2	0.82
SE-US-CNN	14.504 9	12.130 9	0.103 3	0.71
SE-TCA-CNN	10.236 9	12.026 0	0.086 3	0.73
SE-GRU-CNN	8.962 3	11.160 2	0.070 9	0.79
SE-LSTM-CNN	8.054 2	9.863 4	0.059 6	0.91

Model	MAE/（kg/hm²）	RMSE/（kg/hm²）	MAPE	R ²
CNN-US-15	21.827 5	28.974 6	0.095 7	0.44
CNN-TCA-15	15.172 2	18.633 1	0.064 3	0.88
CNN-GRU-15	12.526 8	17.254 5	0.055 5	0.90
CNN-LSTM-15	10.093 1	14.137 4	0.045 0	0.93
CNN-US-30	22.177 5	29.737 7	0.096 8	0.40
CNN-TCA-30	21.568 8	27.953 3	0.092 6	0.71
CNN-GRU-30	19.322 2	26.188 7	0.083 2	0.79
CNN-LSTM-30	18.030 7	25.131 2	0.080 7	0.80

模态框（Modal）标题

Abstract

Cite this article

0 Introduction

1 Materials and methods

1.1 Study area

1.2 Data

1.2.1 Field data acquisition

Fig. 1 Five-point sampling combined with grid method for soybean field survey at Jianshan Farm

1.2.2 Sentinel-2 imagery acquisition and preprocessing

Table 1 Band information of acquired Sentinel-2 imagery

1.2.3 UAV imagery acquisition and preprocessing

Fig. 2 Comparison of UAV multispectral band information before and after radiometric correction

1.3 Feature extraction from remote sensing imagery

Table 2 GLCM texture parameters and their physical significance for soybean canopy characterization

Fig. 3 Spatial distribution of GLCM texture features extracted from Sentinel-2 and UAV imagery during a typical soybean growth stage

1.4 Feature fusion methods for UAV-Sentinel-2 imagery

1.4.1 Transfer learning fusion algorithm

1.4.2 Recurrent neural network algorithm

Fig. 4 Diagram of the gate units in an LSTM network

Fig. 5 Diagram of the gate units in the GRU network

1.5 Improved CNN models

1.5.1 CNN model improved by support vector machine

Fig. 6 SVM-CNN model structure diagram

1.5.2 CNN model improved by attention mechanism

Fig. 7 Calculation process of the self-attention mechanism module

Fig. 8 Schematic diagram of the squeeze-and-excitation （SE） attention mechanism module

Fig. 9 Structure of CNN-GRU/LSTM-Attention model

1.6 Feature fusion of multi-source data based on UAV-Sentinel-2

1.6.1 Construction of time series multi-feature imagery

Fig. 10 Multi-feature time-series imagery of Sentinel-2 for the full months of 2022 at Jianshan Farm， Heilongjiang province

Fig. 11 Sampling point extraction scheme based on ArcGIS at Jianshan Farm

1.6.2 Feature fusion based on transfer learning algorithms

Fig. 12 Feature distribution alignment results of UAV and Sentinel-2 data using TCA algorithm

1.6.3 UAV-Sentinel-2 time-series feature fusion via LSTM/GRU

Fig. 13 Sliding time window processing for LSTM/GRU sequential feature fusion

Fig. 14 LSTM/GRU feature fusion process for UAV-Sentinel-2 multi-source data

2 Results and analysis

2.1 Establishment of soybean yield estimation models using CNN

Table 3 Performance comparison of CNN yield estimation models using half-monthly and monthly multi-feature time-series data

2.2 Comparative analysis of multi-source data fusion methods

Table 4 Performance comparison of CNN models with different fusion methods for half-monthly and monthly time-series data

2.3 Comparative analysis of improved CNN models

Table 5 Performance of improved CNN models with different fusion methods for half-monthly time-series data

Table 6 Performance of improved CNN models with different fusion methods for monthly time-series data

Table 7 Ablation study results of the SA-GRU-CNN model on half-monthly time-series data

2.4 Yield analysis of the optimal soybean yield estimation model

Fig. 15 Spatial distribution of predicted soybean yield for North 13， 14 and 15 Plots at Jianshan Farm in 2022

2.5 Validation of the optimal yield estimation model

Table 8 Performance of improved CNN models with different fusion methods for half-monthly time-series data in 2023

2.6 Discussions

3 Conclusions

References

1.1　Study area

1.2　Data

1.2.1　Field data acquisition

1.2.2　Sentinel-2 imagery acquisition and preprocessing

1.2.3　UAV imagery acquisition and preprocessing

1.3　Feature extraction from remote sensing imagery

1.4　Feature fusion methods for UAV-Sentinel-2 imagery

1.4.1　Transfer learning fusion algorithm

1.4.2　Recurrent neural network algorithm

1.5　Improved CNN models

1.5.1　CNN model improved by support vector machine

1.5.2　CNN model improved by attention mechanism

1.6　Feature fusion of multi-source data based on UAV-Sentinel-2

1.6.1　Construction of time series multi-feature imagery

1.6.2　Feature fusion based on transfer learning algorithms

1.6.3　UAV-Sentinel-2 time-series feature fusion via LSTM/GRU

2.1　Establishment of soybean yield estimation models using CNN

2.2　Comparative analysis of multi-source data fusion methods

2.3　Comparative analysis of improved CNN models

2.4　Yield analysis of the optimal soybean yield estimation model

2.5　Validation of the optimal yield estimation model

2.6　Discussions