Welcome to Smart Agriculture 中文

Smart Agriculture ›› 2025, Vol. 7 ›› Issue (6): 136-148.doi: 10.12133/j.smartag.SA202508025

• Special Issue--Remote Sensing + AI Empowering the Modernization of Agriculture and Rural Areas • Previous Articles     Next Articles

Construction and Evaluation of Lightweight and Interpretable Soybean Remote Sensing Identification Model

WANG Yinhui1, ZHAO Anzhou2, LI Dan1, ZHU Xiufang3,4(), ZHAO Jun5, WANG Ziqing6   

  1. 1. School of Earth Science and Engineering, Hebei University of Engineering, Handan 056038, China
    2. School of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, China
    3. State Key Laboratory of Remote Sensing and Digital Earth, Beijing Normal University, Beijing 100875, China
    4. Key Laboratory of Environmental Change and Natural Disaster, Ministry of Education, Beijing Normal University, Beijing 100875, China
    5. Qingdao Smart Village Development Service Center, Qingdao 266199, China
    6. Qingdao Acelmage Technologis Information Technology Co. , Ltd. , Qingdao 266114, China
  • Received:2025-08-27 Online:2025-11-30
  • Foundation items:National Key R&D Program of China(2023YFB3906201)
  • About author:

    WANG Yinhui, E-mail:

  • corresponding author:
    ZHU Xiufang, E-mail:

Abstract:

Objective Soybean stands as one of the most crucial global crops, serving as a vital source of plant-based protein and vegetable oil while playing an indispensable role in sustainable agricultural systems and global food security. Accurate and timely mapping of soybean cultivation areas is essential for agricultural monitoring, policy-making, and precision farming. However, existing remote sensing methods for soybean identification, such as threshold-based approaches, traditional machine learning, and deep learning, often face challenges related to model complexity, computational efficiency, and interpretability. These limitations collectively highlight the pressing need for a methodological solution that maintains classification accuracy while simultaneously offering computational efficiency, operational simplicity, and interpretable results, a balance crucial for effective agricultural monitoring and policy-making. To address these limitations, a lightweight and interpretable soybean mapping framework was proposed based on Sentinel-2 imagery and a binary logistic regression model in this method. Methods Six representative agricultural regions within the primary U.S. soybean production belt were selected to capture the diversity of cultivation practices and environmental conditions across this major production area. The analysis utilized the complete growing season (April-October) Sentinel-2 satellite imagery from 2021 to 2023. The USDA's cropland data layer served as reference data for model training and validation, benefiting from its extensive ground verification and statistical rigor. All Sentinel-2 images undergo rigorous preprocessing, including atmospheric correction, cloud and shadow masking with the scene classification layer, and spatial subsetting to the regions of interest. The Jeffries-Matusita distance was employed as a quantitative metric to objectively identify the optimal temporal window for soybean discrimination. This statistical measure evaluated the separability between soybean and other major crops across the growing season, with calculations performed on 10 d composite periods to ensure data quality and temporal consistency. The analysis revealed that late July to mid-September (Day of Year 210-260) provided maximum spectral separability, corresponding to the soybean's critical reproductive stages (pod setting and filling) when its spectral signature becomes most distinct from other crops, particularly in short-wave infrared regions sensitive to canopy structure and water content. Within this optimally identified window, a binary logistic regression model was implemented that treated soybean identification as a probabilistic classification problem. The model was trained using spectral features from the optimal period through maximum likelihood estimation, creating a computationally efficient framework that required optimization of only a limited number of parameters while maintaining physical interpretability through explicit feature coefficients. Results and Discussions The comprehensive evaluation showed that the integrated approach balanced classification performance and operational practicality optimally. The temporal optimization identified late July to mid-September as the peak discriminative period, which matches soybean's reproductive phenological stages (when its canopy spectral characteristics differ most from other crops). This finding was consistent across three study years and multiple regions, verifying the robustness of the data-driven window selection. The binary logistic regression model, trained on features from this optimal period, performed excellently: In the 2022 model construction region, it achieved 0.90 overall accuracy and 0.79 Kappa coefficient. When applied to independent validation regions in the same year, it maintained strong performance (0.88 overall accuracy, 0.76 Kappa) without region-specific parameter adjustments, demonstrating outstanding spatial transferability. Temporal validation further confirmed the model's robustness: Across the 2021 to 2023 study period, it maintained consistent performance across all regions, with an average accuracy of 0.87 and Kappa of 0.76. This inter-annual stability is notable, despite potential variations in annual weather, management practices, and planting schedules, and highlights the advantage of basing the model on a stable phenological period rather than fixed calendar dates. The model's lightweight architecture offered practical benefits: Compared with complex ensemble or deep learning methods, it only requires optimizing a limited number of parameters. This parsimonious structure enhances computational efficiency, enabling rapid training and deployment over large areas while reducing reliance on extensive labeled datasets—a key advantage in regions lacking sufficient ground truth data. Beyond accuracy and efficiency, the model exhibited exceptional interpretability via its probabilistic framework and transparent feature weighting. Coefficient analysis provided quantifiable insights into feature contributions, revealing that short-wave infrared bands and specific vegetation indices had the highest discriminative power during the optimal temporal window. Conclusions An effective soybean mapping approach that balances accuracy with operational practicality through the strategic combination of temporal optimization and binary logistic regression was proposed. The method offers a viable solution for operational agricultural monitoring, especially in resource-constrained environments. Future work can enhance the robustness of the model across multiple regional conditions through cross-regional validation in different climate zones and cropping systems, or by integrating transfer learning with domain adaptation methods. This will improve its potential for global-scale application. Concurrently, integrating additional data, methodologies, and models to achieve end-to-end feature learning should be considered.

Key words: Sentinel-2, binary Logistic model, soybean, mapping, remote sensing, crop identification, ligthweight

CLC Number: