Welcome to Smart Agriculture 中文

Smart Agriculture

   

Research on Maize Yield Prediction Model Based on Causal Inference and Machine Learning in Agricultural Fields

WANG Yi1, CUI Xitong1, WANG Chen1, XIONG Baowei1, SHAO Guomin2, WANG Wanying1, CAO Pei3(), HAN Wenting3()   

  1. 1. School of Information Science and Engineering, Xi'an University of Finance and Economics, Xian 710100, China
    2. State Key Laboratory of Eco-hydraulics in Northwest Arid Region, Xi'an University of Technology, Xi'an 710048, China
    3. College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China
  • Received:2025-06-17 Online:2026-01-26
  • Foundation items:National Social Science Fund Project(23BGL252); Basic Research Plan Project for Natural Sciences of Shaanxi Province(2022JQ-363); Key Innovation Chain Projects of Shaanxi Province(2024NC-ZDCYL-05-01); Key Innovation Chain Projects of Shaanxi Province(2023-ZDLNY-58); Xi'an Municipal Science and Technology Plan Project(25NJSYB00014)
  • About author:

    WANG Yi, E-mail:

  • corresponding author:
    CAO Pei, E-mail: ;
    HAN Wenting, E-mail:

Abstract:

[Objective] Maize is one of the most important staple crops in China and serves as a cornerstone of national food security and agricultural sustainability. Accurate and timely prediction of maize yield is essential for optimizing agricultural management practices, supporting market regulation, and guiding policy decisions related to food supply and climate adaptation. In recent years, data-driven yield prediction methods based on machine learning and deep learning have achieved notable improvements in predictive accuracy. However, most existing approaches primarily rely on statistical correlations among variables and often treat influencing factors as independent predictors, without explicitly addressing the complex causal mechanisms and time-lagged interactions that govern crop growth processes. This limitation may lead to reduced model interpretability and compromised robustness under changing environmental conditions. To address these challenges, a novel maize yield prediction framework is proposed that integrates causal inference with a hybrid deep learning model, aiming to improve both predictive performance and mechanistic understanding. [Methods] This study utilized multi-source heterogeneous datasets collected across the maize growing season, including remote sensing–derived vegetation indices, meteorological variables (such as temperature and precipitation), soil profile moisture measurements at multiple depths, and crop observation data corresponding to key phenological stages. First, the Peter–Clark and momentary conditional independence (PCMCI) causal discovery algorithm was applied to systematically identify causal relationships between maize yield and its potential driving factors. The PCMCI method enables the detection of both contemporaneous and time-lagged causal links while effectively controlling for confounding effects in high-dimensional time series data. Through this process, the causal structure of yield formation was explicitly characterized, and key variables with statistically significant causal impacts were selected as inputs for the prediction model. Subsequently, a hybrid moving average–convolutional neural network-long short-term memory (MA-CNN-LSTM) model was constructed to capture the complex spatiotemporal patterns in the causally screened input variables. Specifically, a moving average (MA) module was employed as a preprocessing step to suppress high-frequency noise and enhance signal stability. A convolutional neural network (CNN) was then used to extract latent correlation features among multiple variables, reflecting their joint influence on yield formation. Finally, a long short-term memory (LSTM) network was adopted to model temporal dependencies and cumulative effects across the growing season, enabling effective representation of dynamic yield responses. [Results and Discussions] The causal analysis revealed that soil moisture at depths of 10 cm and 50 cm exerted a significant positive influence on maize yield (P < 0.01), with deeper soil moisture showing a stronger and more persistent time-lagged effect. This finding highlighted the critical role of subsurface water availability in sustaining crop growth during later developmental stages. In addition, vegetation indiced such as the modified chlorophyll absorption ratio index (MCARI) and the normalized difference vegetation index (NDVI) exhibit significant short-term causal relationships with yield during the mid-growth stage of maize, indicating their sensitivity to canopy structure and photosynthetic activity during this period.Comparative experiments conducted against traditional statistical models and conventional machine learning approaches demonstrate that the proposed PCMCI-MA-CNN-LSTM framework consistently achieved superior predictive performance. On the test dataset, the coefficient of determination (R²) reaches 0.955, while the mean absolute error (MAE) and root mean square error (RMSE) were reduced to 1.201 and 1.474. These results indicated that incorporating causal variable selection effectively enhances model accuracy and stability by reducing redundant and spurious correlations. [Conclusions] The results confirm that incorporating causal analysis into yield modeling provides a robust basis for identifying key driving variables and effectively enhances the accuracy and interpretability of maize yield prediction. The proposed framework offers a promising approach for precision agriculture and decision support in crop yield forecasting, particularly under complex and dynamic agro-environmental conditions.

Key words: PCMCI, MA-CNN-LSTM, maize yield prediction

CLC Number: