欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2024, Vol. 6 ›› Issue (1): 89-100.doi: 10.12133/j.smartag.SA202311032

• 专题--智能农业传感器技术 • 上一篇    下一篇

利用便捷式可见-近红外光谱仪和机器学习分辨霉变小麦及霉变程度

贾文珅1,2(), 吕浩林1, 张上1(), 秦英栋2, 周巍3   

  1. 1. 三峡大学 计算机与信息学院,湖北 宜昌 443002,中国
    2. 北京市农林科学院质量标准与检测技术研究所,北京 100097,中国
    3. 河北省食品检验研究院,河北 石家庄 050000,中国
  • 收稿日期:2023-11-27 出版日期:2024-01-30
  • 作者简介:
    贾文珅,研究方向为农产品安全快检方法。E-mail:

    JIA Wenshen, E-mail:

  • 通信作者:
    张 上,博士,副教授,研究方向为计算机应用。E-mail:

Using a Portable Visible-near Infrared Spectrometer and Machine Learning to Distinguish and Quantify Mold Contamination in Wheat

JIA Wenshen1,2(), LYU Haolin1, ZHANG Shang1(), QIN Yingdong2, ZHOU Wei3   

  1. 1. College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
    2. Institute of Quality Standards and Testing Technology, Beijing Academy of Agricultural and Forestry Sciences, Beijing 100097, China
    3. Food Inspection and Research Institute, Hebei Food Safety Key Laboratory, Shijiazhuang 050000, China
  • Received:2023-11-27 Online:2024-01-30
  • corresponding author:
    ZHANG Shang, E-mail:
  • Supported by:
    Key Research and Development Projects of Hebei Province(21375501D); Innovation and Capacity Building Project of Beijing Academy of Agriculture and Forestry Sciences(KJCX20230438); National Natural Science Foundation of China(31801634)

摘要:

目的/意义 可见-近红外光谱可对小麦霉变情况快速无损检测,但是高分辨率光谱仪价格高、体积大,不利于在农业环境中推广,因此通过对低分辨率光谱数据进行优化处理,以期接近高分辨率光谱仪分辨霉变小麦的效果。 方法 使用可见-近红外农产品检测仪(型号VNIAPD,分辨率1.6 nm)和复享光纤光谱仪(型号SINO2040,分辨率0.19 nm)采集100份小麦样本的新鲜状态以及不同霉变状态的光谱数据。首先对SINO2040光谱进行裁剪,让其和VNIAPD波长保持一致,均为640~1 050 nm;然后对其使用标准差标准化(Standard Deviation Normalization, SDN)、标准正态变换(Standard Normal Variation, SNV)、均值中心化(Mean Centrality, MC)、一阶导数(First-order Derivatives, 1ST)、Savitzky-Golay平滑(Savitzky-Golay Smoothing, SG)、多元散射校正(Multiple Scattering Correction, MSC)等多种预处理方法处理并使用离群点检测算法(Local Outlier Factor, LOF)筛选出离群点并剔除;其次使用连续投影算法(Sequential Projection Algorithm, SPA)和最小绝对收缩和选择算法(Least Absolute Shrinkage and Selection Operator, LASSO)对预处理后的光谱进行特征波长提取;最后分别采用K近邻算法(K-Nearest Neighbor, KNN)、支持向量机(Support Vector Machine, SVM)、随机森林(Random Forests, RF)和朴素贝叶斯(Naïve-Bayes)、后向传播神经网络(Back Propagation Neural Network, BPNN)、深度神经网络(Deep Neural Networks, DNN)6种算法对特征波长光谱进行建模分析,从而分辨霉变小麦以及区分霉变程度。 结果和讨论 BPNN、DNN两种神经网络模型的测试集准确率均可达到100%,但是建模时间长,模型内存大;而KNN、SVM、RF和Naïve-Bayes浅层模型的测试集准确率为93.18%~100%,建模速度快、模型内存小。本研究光谱仪VNIAPD在光学参数(光学分辨率1.6 nm)低于SINO2040的光学参数(光学分辨率0.19 nm)且成本更低的情况下,检测准确率到达同一水平。 结论 本研究通过对比光谱数据的不同预处理方法从而找出了对应算法的最佳数据优化选择,使低分辨率光谱仪VNIAPD检测霉变小麦性能可以追平高分辨率光谱仪SINO2040,为基于可见-近红外光谱的小麦霉变低成本无损检测提供了新选择。

关键词: 可见-近红外光谱, 小麦霉变, 机器学习, 无损检测, 食品安全, 神经网络

Abstract:

Objective Traditional methods for detecting mold are time-consuming, labor-intensive, and vulnerable to environmental influences, highlighting the need for a swift, precise, and dependable detection approach. Researchers have utilized visible-near infrared (NIR) spectroscopy for the non-destructive, rapid assessment of wheat moisture content, crude protein content, concealed pests, starch content, dry matter, weight, hardness, origin, and other attributes. However, most of these studies rely on research-grade Visible-NIR spectrometers typically found in laboratories. While these spectrometers offer superior detection accuracy and stability, their bulky size, lack of portability, and high cost hinder their widespread use and adoption across various agricultural product distribution channels. Methods A low-resolution Visible-NIR spectrometer (VNIAPD, with a resolution of 1.6 nm) was utilized to gather wheat data. The aim was to enhance the accuracy of moldy wheat detection by identifying suitable spectral data preprocessing methods using corresponding algorithms. A high-resolution Visible-NIR spectrometer (SINO2040, with a resolution of 0.19 nm) served as a control to validate the instrument and method's effectiveness. The Zhoumai (No. 22) wheat variety was adopted, with a total of 100 samples prepared. The spectra of fresh wheat were scanned and then placed in a constant temperature chamber at 35 °C to replicate the appropriate conditions for mold growth, thereby accelerating the reproduction of naturally occurring mold in the wheat. The degree of mold was categorized based on the cultivation time in the constant temperature chamber, with wheat classified as mildly, moderately, or severely moldy after 3, 6, and 9 days of cultivation, respectively. A total of 400 wheat spectral data points were collected, including 100 samples each of fresh wheat, wheat cultured for 3 days, wheat cultured for 6 days, and wheat cultured for 9 days. Preprocessing methods such as standard deviation normalization (SDN), standard normal variation (SNV), mean centrality (MC), first-order derivatives (1ST), Savitzky-Golay smoothing (SG), and multiple scattering correction (MSC) were applied to the spectral data. Outliers were identified and eliminated using the local outlier factor (LOF) method. Following this, the sequential projection algorithm (SPA) and Least absolute shrinkage and selection operator (LASSO) were used to extract characteristic wavelengths from the preprocessed spectra. Subsequently, six algorithms, including k-nearest neighbors (KNN), support vector machines (SVM), random forests (RF), Naïve-Bayes, back propagation neural networks (BPNN), and deep neural networks (DNN), were employed to model and analyze the feature wavelength spectra, differentiating moldy wheat and classifying the degree of mold. Evaluation criteria encompassed accuracy, modeling time, and model size to aid in selecting the most suitable model for specific application scenarios. Results and discussions Regarding accuracy, even when utilizing the computationally slower and more memory-demanding neural network models BPNN and DNN, both the VNIAPD and SINO2040 achieved a perfect 100% accuracy in the binary classification task of distinguishing between fresh and moldy wheat. They also maintained a faultless 100% accuracy in the ternary classification task that differentiates three varying levels of mold growth. Adopting faster and more memory-efficient shallow models such as KNN, SVM, RF, and Naïve-Bayes, the VNIAPD yielded a top test set accuracy of 97.72% when combined with RF for binary classification. Conversely, SINO2040 achieved 100% accuracy using Naïve-Bayes. In the ternary classification scenario, the VNIAPD hit the mark at 100% accuracy with both KNN and RF, while SINO2040 demonstrated 97.72% accuracy with KNN and SVM. Regarding modeling speed, the shallow machine learning algorithms, including KNN, SVM, RF, and Naïve-Bayes, exhibited quicker training times, with Naïve-Bayes being the swiftest at just 3 ms. In contrast, the neural network algorithms BPNN and DNN required more time for training, taking 3 293 and 18 614 ms, respectively. Regarding memory footprint, BPNN had the largest model size, occupying 4 028 kb, whereas SVM was the most memory-efficient, with a size of only 4 kb. Overall, the VNIAPD matched the SINO2040 in detection accuracy despite having lower optical parameters: A slightly lesser optical resolution of 1.6 nm compared to the SINO2040's 0.19 nm—and a lower cost, highlighting its efficiency and cost-effectiveness in the given context. Conclusions In this study, by comparing different preprocessing methods for spectral data, the optimal data optimization choices for corresponding algorithms were identified. As a result, the low-resolution spectrometer VNIAPD was able to achieve performance on par with the high-resolution spectrometer SINO2040 in detecting moldy wheat, providing a new option for low-cost, non-destructive detection of wheat mold and the degree of moldiness based on Visible-NIR spectroscopy.

Key words: Visible-NIR spectroscopy, wheat mold, machine learning, nondestructive detection, food safety, neural networks