欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于智能优化算法与机器学习的土壤有机质制图最优采样策略

连振翔1,2, 费徐峰2, 任周桥2()   

  1. 1. 浙江农林大学 数学与计算机科学学院,浙江 杭州 311300,中国
    2. 浙江省农业科学院数字农业研究所,浙江 杭州 310021,中国
  • 收稿日期:2025-08-27 出版日期:2025-12-18
  • 基金项目:
    国家重点研发计划(2023YFD1902900)
  • 作者简介:

    连振翔,硕士研究生,研究方向为农林资源大数据与智能决策。E-mail:

  • 通信作者:
    任周桥,博士,研究员,研究方向为数字耕地。E-mail:

Optimal Sampling Strategy for Soil Organic Matter Based on Hippopotamus Optimization Algorithm and Machine Learning

LIAN Zhenxiang1,2, FEI Xufeng2, REN Zhouqiao2()   

  1. 1. School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou 311300, China
    2. Institute of Digital Agriculture, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
  • Received:2025-08-27 Online:2025-12-18
  • Foundation items:National Key Research and Development Program(2023YFD1902900)
  • About author:

    LIAN Zhenxiang, E-mail:

  • Corresponding author:
    REN Zhouqiao, E-mail:

摘要:

【目的/意义】 土壤有机质(Soil Organic Matter, SOM)是土壤质量的核心表征指标,开展SOM制图研究具有重要意义。尽管机器学习已成为提升数字土壤制图(Digital Soil Mapping, DSM)精度的重要手段,但其性能依赖于输入采样数据的质量。因此,合理的采样点布局是DSM的关键前提。本研究旨在剔除冗余采样点、降低采样成本,并进一步提升SOM的预测精度。 【方法】 构建基于河马优化算法(Hippopotamus Optimization Algorithm, HO)并结合随机森林残差克里金插值(Random Forest Residual Kriging, RFRK)的最优采样策略。以浙江省兰溪市已布设调查的1 080个土壤样点为基础,结合遥感环境协变量,优化生成多组采样方案,确定最优的采样密度和样本位置,用于SOM的空间预测与制图。 【结果和讨论】 HO优化的多组采样方案中,当采样密度为2.3点/km²(629个采样点)时效果最佳,均方根误差(Root Mean Square Error, RMSE)和平均绝对误差(Mean Absolute Error, MAE)分别降低至5.11和3.79 g/kg,决定系数(Coefficient of Determination, R²)为0.49,林氏一致性相关系数(Lin's Concordance Correlation Coefficient, LCCC)达到0.63,优化后采样成本较原始方案下降41.8%。 【结论】 综上所述,兼顾采样成本和预测精度,HO是一种潜在有效的采样优化方法,能为类似区域的土壤有机质空间预测与制图提供参考。

关键词: 河马优化算法, 采样优化, 随机森林, 克里金插值, 土壤有机质, 遥感制图

Abstract:

[Objective] Soil quality is crucial for food security, ecosystem health, and sustainable development, but faces degradation due to intensive land use. Accurate soil quality assessment is therefore essential for informed land management and ecological protection. Machine learning has enhanced digital soil mapping (DSM) by improving modeling accuracy through multi-source data integration. Within DSM, soil sampling design is a foundational step that directly influences prediction accuracy, cost, and efficiency. An ideal scheme must balance mapping precision with economic and operational feasibility. This study focuses on soil organic matter (SOM), a core indicator of soil quality affecting fertility, carbon sequestration, and environmental regulation. Precisely mapping its spatial variability is vital for sustainable soil management. To address the need for efficient sampling, the aim is to develop an optimal sampling design method for regional-scale SOM mapping. The objective is to reduce sampling redundancy and cost while improving spatial prediction accuracy. [Methods] A sampling optimization framework was proposed that integrated intelligent optimization algorithms with a hybrid spatial interpolation model. The framework was built upon the hippopotamus optimization algorithm (HO) and incorporated the random forest residual kriging (RFRK) method to construct an optimal sampling strategy for the spatial prediction of SOM. At the initialization stage, a population of candidate solutions—referred to as 'hippopotamuses'—was randomly generated, with each individual representing a potential sampling layout. In this study, the HO was employed to select subsets of sampling points from the training sample pool, with each subset forming a candidate solution. Collectively, these solutions constituted the initial hippopotamus population. The study area was located in Lanxi city, Zhejiang province, where a total of 1 080 field-measured soil samples were collected. These samples were partitioned into a training set (n=756), a validation set (n=108), and a test set (n=216) at a ratio of 7:1:2. Environmental covariates—including terrain attributes, vegetation indices, and climate factors—were extracted from multi-source remote sensing datasets. Using these covariates, the HO optimized sampling schemes across varying densities and spatial configurations. The resulting designs were then evaluated using the RFRK model to assess their SOM prediction performance. This process enabled the identification of the optimal sampling density and spatial layout that balanced accuracy and cost-efficiency. [Results and Discussions] When the HO-RFRK framework was applied, the prediction accuracy of SOM improved significantly as sampling density increased from 0.5 to 2.3 points/km2 (136-629 points). The root mean square error (RMSE) on the test set decreased from 6.04 to 5.11 g/kg, representing a reduction of approximately 15.4%. The lowest prediction errors were observed at a sampling density of 2.3 points/km2, with the RMSE and mean absolute error (MAE) reaching their minimum values of 5.11 and 3.79 g/kg, respectively, beyond which further increases yielded only marginal gains, indicating diminishing returns. To assess the effectiveness of HO, its performance was compared with three established methods: conditioned Latin hypercube sampling (cLHS), genetic algorithm (GA), and particle swarm optimization (PSO). At lower densities (0.5-1.3 points/km2), all methods showed limited predictive power. However, at 1.4 points/km2 (383 points), the HO method was the first to exceed predefined accuracy thresholds (coefficient of determination, R2>0.40; Lin's concordance correlation coefficient, LCCC>0.55), achieving R2=0.41 and LCCC=0.57, outperforming cLHS (R²=0.38, LCCC=0.53), GA (R2=0.39, LCCC=0.52), and PSO (R2=0.38, LCCC=0.51). Across the range of 1.4-2.3 points/km2, HO consistently delivered superior results. At 2.3 points/km², the HO-RFRK combination achieved R2=0.49 and LCCC=0.63, surpassing cLHS, GA, and PSO in both metrics. [Conclusions] Based on the cultivated land of Lanxi city as a test case, a novel sampling optimization strategy is proposed based on the HO. First, the strategy successfully identified an optimal sampling density that maximizes prediction accuracy, as well as a lower, cost-effective density that maintains robust predictive performance with substantially reduced survey costs, defining a practical density range that balances precision and economic feasibility. Second, the RFRK model consistently demonstrated superior prediction accuracy compared to the standard random forest (RF) model across all tested sampling schemes, validating the effectiveness of the integrated HO-RFRK approach. In summary, this optimized strategy achieves high mapping accuracy with greater sampling efficiency, offering a scientifically grounded and practical methodology for reducing long-term soil monitoring costs. It provides a valuable reference for optimizing soil surveys in Lanxi city and other regions with similar environmental settings.

Key words: hippopotamus optimization algorithm, sampling optimization, random forest, kriging interpolation, soil organic matter, remote sensing mapping

中图分类号: