欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2026, Vol. 8 ›› Issue (2): 175-187.doi: 10.12133/j.smartag.SA202506027

• 信息处理与决策 • 上一篇    

基于因果推断和机器学习的农田玉米产量预测模型

王毅1,6, 崔茜彤1, 王晨1, 熊宝伟1, 邵国敏2, 王琬莹1, 曹培3(), 韩文霆4,5()   

  1. 1. 西安财经大学信息学院,陕西 西安 710100,中国
    2. 西安理工大学西北旱区生态水利国家重点实验室,陕西 西安 710048,中国
    3. 西北农林科技大学机械与电子工程学院,陕西杨凌 712100,中国
    4. 西北农林科技大学旱区农业节水研究院,陕西杨凌 712100,中国
    5. 国家作物高效用水工程实验室,陕西杨凌 712100,中国
    6. 智财协同可信计算陕西省高等学校重点实验室,陕西 西安 710100,中国
  • 收稿日期:2025-06-17 出版日期:2026-03-30
  • 基金项目:
    国家社会科学基金(23BGL252); 陕西省自然科学基础研究计划项目(2022JQ-363); 陕西省重点产业创新链项目(2024NC-ZDCYL-05-01); 陕西省重点产业创新链项目(2023-ZDLNY-58); 西安市科技计划项目(25NJSYB00014)
  • 作者简介:

    王 毅,博士,讲师,研究方向为农情信息空天地一体化智能感知与精准作业技术。E-mail:

  • 通信作者:
    曹 培,博士,副教授,研究方向为农业水信息天空地一体化智能感知与精准灌溉技术及装备。E-mail:
    韩文霆,博士,研究员,研究方向为农业水信息天空地一体化智能感知与精准灌溉技术及装备。E-mail:

Field Maize Yield Prediction Model Based on Causal Inference and Machine Learningin Agricultural Fields

WANG Yi1,6, CUI Xitong1, WANG Chen1, XIONG Baowei1, SHAO Guomin2, WANG Wanying1, CAO Pei3(), HAN Wenting4,5()   

  1. 1. School of Information Science and Engineering, Xi'an University of Finance and Economics, Xi'an 710100, China
    2. State Key Laboratory of Eco-hydraulics in Northwest Arid Region, Xi'an University of Technology, Xi'an 710048, China
    3. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China
    4. Institute of Water-saving Agriculture in Arid Areas (IWSA), Northwest A&F University, Yangling 712100, China
    5. National Engineering Laboratory for Crop High-efficiency Water Use, Yangling 712100, China
    6. Intelligent Financial Collaborative Trusted Computing Key Laboratory of Shaanxi Province for Higher Education Institutions, Xi'an 710100, China
  • Received:2025-06-17 Online:2026-03-30
  • Foundation items:National Social Science Fund Project(23BGL252); Basic Research Plan Project for Natural Sciences of Shaanxi Province(2022JQ-363); Key Innovation Chain Projects of Shaanxi Province(2024NC-ZDCYL-05-01); Key Innovation Chain Projects of Shaanxi Province(2023-ZDLNY-58); Xi'an Municipal Science and Technology Plan Project(25NJSYB00014)
  • About author:

    WANG Yi, E-mail:

  • Corresponding author:
    CAO Pei, E-mail:
    HAN Wenting, E-mail:

摘要:

【目的/意义】 玉米作为中国主要粮食作物,其产量的精准预测在维护国家粮食供给体系稳定方面具有重要意义。针对传统预测方法难以揭示变量间因果关系的问题,提出了一种融合因果推断与机器学习的预测方法。 【方法】 首先,基于遥感指数、气象数据、土壤剖面湿度和阶段性作物观测数据等多源异构信息,采用偏相关与瞬时条件独立算法系统分析玉米产量与各影响因子之间的因果关系。其次,构建了移动平均-卷积神经网络-长短期记忆网络混合模型,先通过移动平均模块平滑数据噪声,进而利用卷积神经网络提取变量间的空间关联特征。最后,通过长短期记忆网络捕捉产量随时间变化的动态特征。 【结果和讨论】 10 cm和50 cm深度的土壤湿度对产量具有显著的正向影响(P<0.01),其中深层湿度具有更强的时滞效应;反射率中的改进叶绿素吸收指数与归一化植被指数等植被指数则在玉米生长中期呈现出短期因果关系。与其他模型的对比实验结果表明,所提出的模型在多个指标上表现突出,其中决定系数(R²)达到0.955,平均绝对误差和均方根误差分别为1.201 kg/亩和1.474 kg/亩(1 hm2=15亩)。 【结论】 因果分析可为模型提供关键变量筛选依据,显著提升产量预测性能。

关键词: 偏相关与瞬时条件独立算法, 移动平均, 卷积神经网络, 长短期记忆网络, 产量预测, 机器学习, 因果推断

Abstract:

[Objective] Maize is one of the most important staple crops in the world and serves as a cornerstone of food security and agricultural sustainability. Accurate and timely prediction of maize yield is essential for optimizing agricultural management practices, supporting market regulation, and guiding policy decisions related to food supply and climate adaptation. In recent years, data-driven yield prediction methods based on machine learning and deep learning have achieved notable improvements in predictive accuracy. However, most existing approaches primarily rely on statistical correlations among variables and often treat influencing factors as independent predictors, without explicitly addressing the complex causal mechanisms and time-lagged interactions that govern crop growth processes. This limitation may lead to reduced model interpretability and compromised robustness under changing environmental conditions. To address these challenges, a novel maize yield prediction framework that integrates causal inference with a hybrid deep learning model was proposed, aiming to improve both predictive performance and mechanistic understanding. [Methods] Multi-source heterogeneous datasets collected across the maize growing season were utilized, including remote sensing-derived vegetation indices, meteorological variables (such as temperature and precipitation), soil profile moisture measurements at multiple depths, and crop observation data corresponding to key phenological stages. First, the Peter-Clark and momentary conditional independence (PCMCI) causal discovery algorithm was applied to systematically identify causal relationships between maize yield and its potential driving factors. The PCMCI method enables the detection of both contemporaneous and time-lagged causal links while effectively controlling for confounding effects in high-dimensional time series data. Through this process, the causal structure of yield formation was explicitly characterized, and key variables with statistically significant causal impacts were selected as inputs for the prediction model. Subsequently, a hybrid moving average, convolutional neural network-long short-term memory (MA-CNN-LSTM) model was constructed to capture the complex spatiotemporal patterns in the causally screened input variables. Specifically, a moving average module was employed as a preprocessing step to suppress high-frequency noise and enhance signal stability. A CNN was then used to extract latent correlation features among multiple variables, reflecting their joint influence on yield formation. Finally, an LSTM network was adopted to model temporal dependencies and cumulative effects across the growing season, enabling effective representation of dynamic yield responses. [Results and Discussions] The causal analysis revealed that soil moisture at depths of 10 cm and 50 cm exerted a significant positive influence on maize yield (P < 0.01), with deeper soil moisture showing a stronger and more persistent time-lagged effect. This finding highlighted the critical role of subsurface water availability in sustaining crop growth during later developmental stages. In addition, vegetation indiced such as the modified chlorophyll absorption ratio index and the normalized difference vegetation index exhibited significant short-term causal relationships with yield during the mid-growth stage of maize, indicating their sensitivity to canopy structure and photosynthetic activity during this period. Comparative experiments conducted against traditional statistical models and conventional machine learning approaches demonstrated that the proposed PCMCI-MA-CNN-LSTM framework consistently achieved superior predictive performance. On the test dataset, the coefficient of determination (R2) reached 0.955, while the mean absolute error (MAE) and root mean square error (RMSE) were reduced to 1.201 kg/mu and 1.474 kg/mu (1 hm2=15 mu). These results indicated that incorporating causal variable selection effectively enhances model accuracy and stability by reducing redundant and spurious correlations. [Conclusions] The results confirm that incorporating causal analysis into yield modeling provides a robust basis for identifying key driving variables and effectively enhances the accuracy and interpretability of maize yield prediction. The proposed framework offers a promising approach for precision agriculture and decision support in crop yield forecasting, particularly under complex and dynamic agro-environmental conditions.

Key words: peter-clark and momentary conditional independence (PCMCI), moving average, convolutional neural network, long short-term memory, yield prediction, machine learning, causal inference

中图分类号: