Welcome to Smart Agriculture 中文

Smart Agriculture

   

Geographically Weighted Random Forest for County-scale Digital Mapping of Soil Organic Matter in the Central Shandong Mountains

ZHANG Shulin1, CUI Liqin3, LIU Jian1, ZHANG Canting1, WANG Hongjia1, ZHANG Tingting1, WANG Ailing1,2()   

  1. 1. College of Resources and Environment, Shandong Agricultural University, Tai'an 271018, China
    2. National Engineering Research Center for Efficient Utilization of Soil and Fertilizer, Shandong Agricultural University, Tai'an 271018, China
    3. Yiyuan County Agriculture and Rural Affairs Bureau, Zibo 256100, China
  • Received:2025-08-21 Online:2026-01-06
  • Foundation items:National Natural Science Foundation of China(42171378); The Natural Science Foundation of Shandong Province(ZR2021MD018); The Special Funds of Taishan Scholar of Shandong Province(tsqnz20231205)
  • About author:

    ZHANG Shulin, E-mail:

  • corresponding author:
    WANG Ailing, E-mail:.

Abstract:

[Objective] Soil organic matter (SOM) is a fundamental indicator for evaluating soil fertility and soil quality. In mountainous counties characterized by complex terrain and pronounced environmental heterogeneity, SOM exhibits strong spatial variability even over short distances, which often results in limited prediction accuracy for conventional digital soil mapping (DSM) models. With the nationwide implementation of the Third National Soil Census, the demand for high-resolution and high-accuracy SOM mapping at the county scale has become increasingly urgent. Against this backdrop, Yiyuan county in Shandong Province was selected as the study area. The aim is to assess the applicability of the geographically weighted random forest (GWRF) model in SOM mapping within complex terrain regions. Furthermore, it sought to systematically compare the predictive performance of GWRF with several commonly used models, thereby providing technical support for soil resource surveys, census result compilation, and county-level land management. [Methods] The dataset consisting of 1 565 measured topsoil SOM samples was utilized, along with nineteen environmental variables representing five categories: topography, climate, vegetation, soil properties, and land use. Through correlation analysis and collinearity diagnostics, twelve key variables were retained for model construction. The GWRF model, which integrates localized spatial modeling with nonlinear machine-learning capability, was developed to generate high-resolution SOM predictions across the study region. An adaptive bandwidth strategy was employed, and the optimal bandwidth of 500 was determined. Grid search combined with cross-validation was used to identify the optimal mtry value of 4 for the random forest component. In addition to GWRF, four reference models were constructed for comparison: ordinary kriging (OK), multiple linear regression (MLR), geographically weighted regression (GWR), and random forest (RF). Model performance was evaluated using two commonly adopted accuracy metrics—the coefficient of determination (R²) and root-mean-square error (RMSE). [Results and Discussions] This study focused on exploring the spatial pattern of SOM in the study area while systematically comparing the performance of multiple DSM models. Overall, SOM levels in Yiyuan County were relatively low, with a mean value of 15.62 g/kg. The spatial variation was moderate and exhibited a clear pattern: SOM values were higher in the central region and lower in the northeastern and southwestern areas. Considerable differences were observed in prediction accuracy among the five models. The GWRF model achieved the best overall performance, with an R2 of 0.48 and an RMSE of 5.12 g/kg. This accuracy clearly surpassed that of RF (R2=0.41) and GWR (R2=0.35), and its advantage over MLR and OK was even more pronounced. A paired-sample t-test further confirmed that the accuracy improvements of GWRF over the other four models were statistically significant, supporting the robustness and reliability of the model's enhanced performance. According to the mapping results, the OK model produced an excessively smooth surface, making it difficult to reveal local details. While the MLR and GWR models could characterize certain environmental effects, they exhibited significant biases such as underestimation of high values and overestimation of low values. In contrast, the GWRF model performed prominently in capturing both global trends and local subtle variations. The analysis of variable importance showed that soil type, annual evapotranspiration, slope, and sand content were the most influential factors governing SOM distribution in the study area. Moreover, their spatially varying importance revealed notable heterogeneity. [Conclusions] This study demonstrated that the GWRF model possesses significant advantages in county-scale SOM digital mapping within mountainous regions. Its prediction accuracy markedly exceeded that of RF and conventional linear models, owing to its ability to simultaneously capture nonlinear environmental relationships and localized spatial variations. The enhanced mapping precision and improved representation of spatial details highlight the strong potential of GWRF for applications requiring high-accuracy soil information, such as the Third National Soil Census.The successful implementation of GWRF in this study suggests that the model is well-suited for SOM prediction under complex terrain conditions and can serve as an effective technical tool for county-level soil property estimation. Future research may incorporate human-activity-related variables, employ localized variable-selection strategies within the GWRF framework to further refine model performance, and explore the application potential of more advanced deep learning models in soil property mapping.

Key words: soil organic matter, digital soil mapping, mountainous regions, geographically weighted random forest

CLC Number: