欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2023, Vol. 5 ›› Issue (2): 82-92.doi: 10.12133/j.smartag.SA202304004

• 专题--机器视觉与农业智能感知 • 上一篇    下一篇

基于多种机器学习算法预测广西蔗区甘蔗产量

石杰锋1(), 黄为1, 范协洋1, 李修华1,2(), 卢阳旭1, 蒋柱辉3, 王泽平4, 罗维1, 张木清2   

  1. 1.广西大学 电气工程学院, 广西 南宁 530004
    2.广西大学甘蔗生物学重点实验室, 广西 南宁 530004
    3.广西糖业集团有限公司, 广西 南宁 530022
    4.广西农业科学院甘蔗研究所, 广西 南宁 530007
  • 收稿日期:2023-04-08 出版日期:2023-06-30
  • 基金资助:
    广西科技重大专项(桂科AA22117004);国家自然科学基金项目(31760342)
  • 作者简介:石杰锋,研究方向为农业信息化。E-mail:1500807980@qq.com
  • 通信作者: 李修华,博士,副教授,研究方向为作物检测和农业信息化。E-mail:lixh@gxu.edu.cn

Yield Prediction Models in Guangxi Sugarcane Planting Regions Based on Machine Learning Methods

SHI Jiefeng1(), HUANG Wei1, FAN Xieyang1, LI Xiuhua1,2(), LU Yangxu1, JIANG Zhuhui3, WANG Zeping4, LUO Wei1, ZHANG Muqing2   

  1. 1.School of Electrical Engineering, Guangxi University, Nanning 530004, China
    2.Guangxi Key Laboratory of Sugarcane Biology, Guangxi University, Nanning 530004, China
    3.Guangxi Sugar Industry Group, Nanning 530022, China
    4.Sugarcane Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
  • Received:2023-04-08 Online:2023-06-30
  • Supported by:
    Guangxi Science and Technology Major Project (Guike AA22117004); National Natural Science Foundation of China project (31760342)

摘要:

[目的/意义] 分析广西甘蔗主产区甘蔗产量与气象因素的关系,利用气象数据预测甘蔗产量,为糖厂及相关管理部门提供科学的数据支撑。 [方法] 选用2002~2019年广西五个不同地级市内蔗区的产量数据及14种逐日气象数据,将每年的各气象因子以78个逐月递增的连续时段的均值与产量进行相关性分析,根据敏感时段分析法确定关键气象因子,并分析各气象因子在敏感时段对产量的影响。分别利用BP神经网络(BP Neural Network,BPNN)、支持向量机(Support Vector Machine,SVM)、随机森林(Random Forest,RF)、长短期记忆网络(Long Short-Term Memory,LSTM)建立单蔗区产量预测模型,并采用以全生育期气象均值作为模型输入的方法进行对照实验。使用HP滤波法(Hodrick Prescott Filter)分离出甘蔗气象产量,将5个蔗区的数据混合,分别利用RF、SVM、BPNN和LSTM建立通用的多蔗区气象产量预测模型。[结果和讨论]对于单蔗区,敏感时段分析法的模型预测效果明显优于全生育期取气象均值的方法,LSTM模型对于上述两种数据处理方法的预测效果均明显优于目前广泛使用的BPNN、SVM、RF模型,敏感时段分析法的LSTM模型整体的均方根误差(Root Mean Square Error,RMSE)和平均绝对百分比误差(Mean Absolute Percentage Error,MAPE)分别为10.34 t/ha和6.85%,决定系数Rv2为0.8489。对于多蔗区,LSTM预测结果较差,RF、SVM及BPNN三种预测模型都取得了良好的效果,预测效果最好的BPNN模型的RMSE和MAPE分别为0.98 t/ha和9.59%,Rv2为0.965。 [结论] 通过敏感时段分析法筛选的关键气象因子与产量均呈显著相关,根据敏感时段能准确地分析各气象因子对产量的影响。使用LSTM模型预测单蔗区产量,使用BPNN模型预测多蔗区甘蔗气象产量的方法是可行的,且预测误差在可接受范围内。

关键词: 气象因子, HP滤波, 甘蔗产量, BPNN模型, LSTM模型, 机器学习

Abstract:

[Objective] Accurate prediction of changes in sugarcane yield in Guangxi can provide important reference for the formulation of relevant policies by the government and provide decision-making basis for farmers to guide sugarcane planting, thereby improving sugarcane yield and quality and promoting the development of the sugarcane industry. This research was conducted to provide scientific data support for sugar factories and related management departments, explore the relationship between sugarcane yield and meteorological factors in the main sugarcane producing areas of Guangxi Zhuang Autonomous Region. [Methods] The study area included five sugarcane planting regions which laid in five different counties in Guangxi, China. The average yields per hectare of each planting regions were provided by Guangxi Sugar Industry Group which controls the sugar refineries of each planting region. The daily meteorological data including 14 meteorological factors from 2002 to 2019 were acquired from National Data Center for Meteorological Sciences to analyze their influences placed on sugarcane yield. Since meteorological factors could pose different influences on sugarcane growth during different time spans, a new kind of factor which includes meteorological factors and time spans was defined, such as the average precipitation in August, the average temperature from February to April, etc. And then the inter-correlation of all the meteorological factors of different time spans and their correlations with yields were analyzed to screen out the key meteorological factors of sensitive time spans. After that, four algorithms of BP neural network (BPNN), support vector machine (SVM), random forest (RF), and long short-term memory (LSTM) were employed to establish sugarcane apparent yield prediction models for each planting region. Their corresponding reference models based on the annual meteorological factors were also built. Additionally, the meteorological yields of every planting region were extracted by HP filtering, and a general meteorological yield prediction model was built based on the data of all the five planting regions by using RF, SVM BPNN, and LSTM, respectively. [Results and Discussions] The correlation analysis showed that different planting regions have different sensitive meteorological factors and key time spans. The highly representative meteorological factors mainly included sunshine hours, precipitation, and atmospheric pressure. According to the results of correlation analysis, in Region 1, the highest negative correlation coefficient with yield was observed at the sunshine hours during October and November, while the highest positive correlation coefficient was found at the minimum relative humidity in November. In Region 2, the maximum positive correlation coefficient with yield was observed at the average vapor pressure during February and March, whereas the maximum negative correlation coefficient was associated with the precipitation in August and September. In Region 3, the maximum positive correlation coefficient with yield was found at the 20‒20 precipitation during August and September, while the maximum negative correlation coefficient was related to sunshine hours in the same period. In Region 4, the maximum positive correlation coefficient with yield was observed at the 20‒20 precipitation from March to December, whereas the maximum negative correlation coefficient was associated with the highest atmospheric pressure from August to December. In Region 5, the maximum positive correlation coefficient with yield was found at the average vapor pressure from June and to August, whereas the maximum negative correlation coefficient as related to the lowest atmospheric pressure in February and March. For each specific planting region, the accuracy of apparent yield prediction model based on sensitive meteorological factors during key time spans was obviously better than that based on the annual average meteorological values. The LSTM model performed significantly better than the widely used classic BPNN, SVM, and RF models for both kinds of meteorological factors (under sensitive time spans or annually). The overall root mean square error (RMSE) and mean absolute percentage error (MAPE) of the LSTM model under key time spans were 10.34 t/ha and 6.85%, respectively, with a coefficient of determination Rv2 of 0.8489 between the predicted values and true values. For the general prediction models of the meteorological yield to multiple the sugarcane planting regions, the RF, SVM, and BPNN models achieved good results, and the best prediction performance went to BPNN model, with an RMSE of 0.98 t/ha, MAPE of 9.59%, and Rv2 of 0.965. The RMSE and MAPE of the LSTM model were 0.25 t/ha and 39.99%, respectively, and the Rv2 was 0.77. [Conclusions] Sensitive meteorological factors under key time spans were found to be more significantly correlated with the yields than the annual average meteorological factors. LSTM model shows better performances on apparent yield prediction for specific planting region than the classic BPNN, SVM, and RF models, but BPNN model showed better results than other models in predicting meteorological yield over multiple sugarcane planting regions.

Key words: meteorological factor, HP filter, sugarcane yield, BPNN model, LSTM model, machine learning

中图分类号: