利用便捷式可见-近红外光谱仪和机器学习分辨霉变小麦及霉变程度

doi:10.12133/j.smartag.SA202311032

摘要/Abstract

摘要：

目的/意义 可见-近红外光谱可对小麦霉变情况快速无损检测，但是高分辨率光谱仪价格高、体积大，不利于在农业环境中推广，因此通过对低分辨率光谱数据进行优化处理，以期接近高分辨率光谱仪分辨霉变小麦的效果。方法使用可见-近红外农产品检测仪（型号VNIAPD，分辨率1.6 nm）和复享光纤光谱仪（型号SINO2040，分辨率0.19 nm）采集100份小麦样本的新鲜状态以及不同霉变状态的光谱数据。首先对SINO2040光谱进行裁剪，让其和VNIAPD波长保持一致，均为640~1 050 nm；然后对其使用标准差标准化（Standard Deviation Normalization, SDN）、标准正态变换（Standard Normal Variation, SNV）、均值中心化（Mean Centrality, MC）、一阶导数（First-order Derivatives, 1ST）、Savitzky-Golay平滑（Savitzky-Golay Smoothing, SG）、多元散射校正（Multiple Scattering Correction, MSC）等多种预处理方法处理并使用离群点检测算法（Local Outlier Factor, LOF）筛选出离群点并剔除；其次使用连续投影算法（Sequential Projection Algorithm, SPA）和最小绝对收缩和选择算法（Least Absolute Shrinkage and Selection Operator, LASSO）对预处理后的光谱进行特征波长提取；最后分别采用K近邻算法（K-Nearest Neighbor, KNN）、支持向量机（Support Vector Machine, SVM）、随机森林（Random Forests, RF）和朴素贝叶斯（Naïve-Bayes）、后向传播神经网络（Back Propagation Neural Network, BPNN）、深度神经网络（Deep Neural Networks, DNN）6种算法对特征波长光谱进行建模分析，从而分辨霉变小麦以及区分霉变程度。 结果和讨论 BPNN、DNN两种神经网络模型的测试集准确率均可达到100%，但是建模时间长，模型内存大；而KNN、SVM、RF和Naïve-Bayes浅层模型的测试集准确率为93.18%~100%，建模速度快、模型内存小。本研究光谱仪VNIAPD在光学参数（光学分辨率1.6 nm）低于SINO2040的光学参数（光学分辨率0.19 nm）且成本更低的情况下，检测准确率到达同一水平。结论本研究通过对比光谱数据的不同预处理方法从而找出了对应算法的最佳数据优化选择，使低分辨率光谱仪VNIAPD检测霉变小麦性能可以追平高分辨率光谱仪SINO2040，为基于可见-近红外光谱的小麦霉变低成本无损检测提供了新选择。

关键词: 可见-近红外光谱, 小麦霉变, 机器学习, 无损检测, 食品安全, 神经网络

Abstract:

Objective Traditional methods for detecting mold are time-consuming, labor-intensive, and vulnerable to environmental influences, highlighting the need for a swift, precise, and dependable detection approach. Researchers have utilized visible-near infrared (NIR) spectroscopy for the non-destructive, rapid assessment of wheat moisture content, crude protein content, concealed pests, starch content, dry matter, weight, hardness, origin, and other attributes. However, most of these studies rely on research-grade Visible-NIR spectrometers typically found in laboratories. While these spectrometers offer superior detection accuracy and stability, their bulky size, lack of portability, and high cost hinder their widespread use and adoption across various agricultural product distribution channels. Methods A low-resolution Visible-NIR spectrometer (VNIAPD, with a resolution of 1.6 nm) was utilized to gather wheat data. The aim was to enhance the accuracy of moldy wheat detection by identifying suitable spectral data preprocessing methods using corresponding algorithms. A high-resolution Visible-NIR spectrometer (SINO2040, with a resolution of 0.19 nm) served as a control to validate the instrument and method's effectiveness. The Zhoumai (No. 22) wheat variety was adopted, with a total of 100 samples prepared. The spectra of fresh wheat were scanned and then placed in a constant temperature chamber at 35 °C to replicate the appropriate conditions for mold growth, thereby accelerating the reproduction of naturally occurring mold in the wheat. The degree of mold was categorized based on the cultivation time in the constant temperature chamber, with wheat classified as mildly, moderately, or severely moldy after 3, 6, and 9 days of cultivation, respectively. A total of 400 wheat spectral data points were collected, including 100 samples each of fresh wheat, wheat cultured for 3 days, wheat cultured for 6 days, and wheat cultured for 9 days. Preprocessing methods such as standard deviation normalization (SDN), standard normal variation (SNV), mean centrality (MC), first-order derivatives (1ST), Savitzky-Golay smoothing (SG), and multiple scattering correction (MSC) were applied to the spectral data. Outliers were identified and eliminated using the local outlier factor (LOF) method. Following this, the sequential projection algorithm (SPA) and Least absolute shrinkage and selection operator (LASSO) were used to extract characteristic wavelengths from the preprocessed spectra. Subsequently, six algorithms, including k-nearest neighbors (KNN), support vector machines (SVM), random forests (RF), Naïve-Bayes, back propagation neural networks (BPNN), and deep neural networks (DNN), were employed to model and analyze the feature wavelength spectra, differentiating moldy wheat and classifying the degree of mold. Evaluation criteria encompassed accuracy, modeling time, and model size to aid in selecting the most suitable model for specific application scenarios. Results and discussions Regarding accuracy, even when utilizing the computationally slower and more memory-demanding neural network models BPNN and DNN, both the VNIAPD and SINO2040 achieved a perfect 100% accuracy in the binary classification task of distinguishing between fresh and moldy wheat. They also maintained a faultless 100% accuracy in the ternary classification task that differentiates three varying levels of mold growth. Adopting faster and more memory-efficient shallow models such as KNN, SVM, RF, and Naïve-Bayes, the VNIAPD yielded a top test set accuracy of 97.72% when combined with RF for binary classification. Conversely, SINO2040 achieved 100% accuracy using Naïve-Bayes. In the ternary classification scenario, the VNIAPD hit the mark at 100% accuracy with both KNN and RF, while SINO2040 demonstrated 97.72% accuracy with KNN and SVM. Regarding modeling speed, the shallow machine learning algorithms, including KNN, SVM, RF, and Naïve-Bayes, exhibited quicker training times, with Naïve-Bayes being the swiftest at just 3 ms. In contrast, the neural network algorithms BPNN and DNN required more time for training, taking 3 293 and 18 614 ms, respectively. Regarding memory footprint, BPNN had the largest model size, occupying 4 028 kb, whereas SVM was the most memory-efficient, with a size of only 4 kb. Overall, the VNIAPD matched the SINO2040 in detection accuracy despite having lower optical parameters: A slightly lesser optical resolution of 1.6 nm compared to the SINO2040's 0.19 nm—and a lower cost, highlighting its efficiency and cost-effectiveness in the given context. Conclusions In this study, by comparing different preprocessing methods for spectral data, the optimal data optimization choices for corresponding algorithms were identified. As a result, the low-resolution spectrometer VNIAPD was able to achieve performance on par with the high-resolution spectrometer SINO2040 in detecting moldy wheat, providing a new option for low-cost, non-destructive detection of wheat mold and the degree of moldiness based on Visible-NIR spectroscopy.

Key words: Visible-NIR spectroscopy, wheat mold, machine learning, nondestructive detection, food safety, neural networks

贾文珅, 吕浩林, 张上, 秦英栋, 周巍. 利用便捷式可见-近红外光谱仪和机器学习分辨霉变小麦及霉变程度[J]. 智慧农业(中英文), 2024, 6(1): 89-100.

JIA Wenshen, LYU Haolin, ZHANG Shang, QIN Yingdong, ZHOU Wei. Using a Portable Visible-near Infrared Spectrometer and Machine Learning to Distinguish and Quantify Mold Contamination in Wheat[J]. Smart Agriculture, 2024, 6(1): 89-100.

图/表 16

图1

图2

图3

图4

图5

表1

图6

图7

图8

图9

表2

表3

表4

表5

表6

表7

参考文献 20

1	王小萌, 吴文福, 尹君, 等. 基于温湿度场云图的小麦粮堆霉变与温湿度耦合分析[J]. 农业工程学报, 2018, 34(10): 260-266.
	WANG X M, WU W F, YIN J, et al. Analysis of wheat bulk mould and temperature-humidity coupling based on temperature and humidity field cloud map[J]. Transactions of the Chinese society of agricultural engineering, 2018, 34(10): 260-266.
2	悦燕飞, 王若兰, 渠琛玲. 小麦储藏过程中发热霉变研究进展[J]. 粮食与油脂, 2018, 31(7): 18-20.
	YUE Y F, WANG R L, QU C L. Research progress on fever and mildew of wheat during storage[J]. Cereals & oils, 2018, 31(7): 18-20.
3	ZHANG Y Y, PEI F, FANG Y, et al. Interactions among fungal community, fusarium mycotoxins, and components of harvested wheat under simulated storage conditions[J]. Journal of agricultural and food chemistry, 2019, 67(30): 8411-8418.
4	张红涛, 张亮, 谭联, 等. 基于近红外高光谱成像的单籽粒小麦品种分类研究[J]. 粮食与油脂, 2022, 35(12): 59-62.
	ZHANG H T, ZHANG L, TAN L, et al. Classification of single wheat grain varieties based on near-infrared hyperspectral imaging[J]. Cereals & oils, 2022, 35(12): 59-62.
5	MAGWAZA L S, LANDAHL S, CRONJE P J R, et al. The use of Vis/NIRS and chemometric analysis to predict fruit defects and postharvest behaviour of 'Nules Clementine' mandarin fruit[J]. Food chemistry, 2014, 163: 267-274.
6	孙晓荣, 郑冬钰, 刘翠玲, 等. 小麦粉品质在线无损快速检测系统设计与实现[J]. 食品与机械, 2022, 38(12): 87-91.
	SUN X R, ZHENG D Y, LIU C L, et al. Design and implementation of on-line nondestructive rapid testing system for wheat flour quality[J]. Food & machinery, 2022, 38(12): 87-91.
7	田静, 陈斌, 陆道礼, 等. 不同分光原理近红外光谱仪光谱标准化方法在小麦粉品质检测中的应用[J]. 中国食品学报, 2022, 22(10): 286-294.
	TIAN J, CHEN B, LU D L, et al. Application of spectral standardization of different spectral types of near-infrared analyzers in the quality detection of wheat flour[J]. Journal of Chinese institute of food science and technology, 2022, 22(10): 286-294.
8	鲁玉杰, 王文敬, 张俊东, 等. 基于近红外光谱技术及ELM对小麦中不同生长阶段米象的分类识别[J]. 河南工业大学学报(自然科学版), 2023, 44(1): 104-111.
	LU Y J, WANG W J, ZHANG J D, et al. Classification and recognition of Sitophilus oryzae in different growth stages of wheat based on near-infrared spectroscopy and ELM[J]. Journal of Henan university of technology (natural science edition), 2023, 44(1): 104-111.
9	王晓琼, 陈丽, 向娜娜, 等. 基于近红外光谱分析技术测定小麦淀粉的含量[J]. 粮食与饲料工业, 2021(6): 58-60.
	WANG X Q, CHEN L, XIANG N N, et al. Determination of wheat starch content based on near infrared spectroscopy analysis technology[J]. Cereal & feed industry, 2021(6): 58-60.
10	陈岩, 何鸿举, 欧阳娟, 等. 近红外结合线性回归算法快速预测小麦籽粒干物质和重量[J]. 食品工业科技, 2022, 43(4): 323-331.
	CHEN Y, HE H J, OUYANG J, et al. NIR combined with linear regression algorithm for rapid prediction of dry matter and weight in wheat grain[J]. Science and technology of food industry, 2022, 43(4): 323-331.
11	姜明伟, 王彩红, 张庆辉. 基于CARS变量选择方法的小麦硬度测定研究[J]. 河南工业大学学报(自然科学版), 2020, 41(6): 91-95, 105.
	JIANG M W, WANG C H, ZHANG Q H. Study of wheat hardness determination based on CARS variable selection method[J]. Journal of Henan university of technology (natural science edition), 2020, 41(6): 91-95, 105.
12	邹小波, 封韬, 郑开逸, 等. 利用近红外及中红外融合技术对小麦产地和烘干程度的同时鉴别[J]. 光谱学与光谱分析, 2019, 39(5): 1445-1450.
	ZOU X B, FENG T, ZHENG K Y, et al. Simultaneous identification of wheat origin and drying degree using near-infrared and mid-infrared fusion techniques[J]. Spectroscopy and spectral analysis, 2019, 39(5): 1445-1450.
13	沈飞, 刘潇, 裴斐, 等. ATR-FTIR在小麦及其制品呕吐毒素污染水平快速测定中的应用[J]. 食品科学, 2019, 40(2): 293-297.
	SHEN F, LIU X, PEI F, et al. Rapid identification of deoxynivalenol contamination in wheat and its products by attenuated total reflectance fourier transform infrared spectroscopy (ATR-FTIR)[J]. Food science, 2019, 40(2): 293-297.
14	宋金鹏, 梁琨, 张驰, 等. 基于深度学习与可见-近红外光谱的患腥黑穗病小麦籽粒分类研究[J]. 分析测试学报, 2023, 42(7): 784-793.
	SONG J P, LIANG K, ZHANG C, et al. Research on classification of common bunt of wheat kernels based on visible-near infrared spectroscopy combined with deep learning algorithms[J]. Journal of instrumental analysis, 2023, 42(7): 784-793.
15	袁莹, 王伟, 褚璇, 等. 基于傅里叶变换近红外和支持向量机的霉变玉米检测[J]. 中国粮油学报, 2015, 30(5): 143-146.
	YUAN Y, WANG W, CHU X, et al. Detection of moldy corns with FT- NIR spectroscopy based on SVM[J]. Journal of the Chinese cereals and oils association, 2015, 30(5): 143-146.
16	MANCINI M, MAZZONI L, QADERI R, et al. Prediction of soluble solids content by means of NIR spectroscopy and relation with botrytis cinerea tolerance in strawberry cultivars[J]. Horticulturae, 2023, 9(1): ID 91.
17	JIANG H, DENG J H, ZHU C Y. Quantitative analysis of aflatoxin B1 in moldy peanuts based on near-infrared spectra with two-dimensional convolutional neural network[J]. Infrared physics & technology, 2023, 131: ID 104672.
18	SHEN F, WU Q F, LIU P, et al. Detection of Aspergillus spp. contamination levels in peanuts by near infrared spectroscopy and electronic nose[J]. Food control, 2018, 93: 1-8.
19	刘建学, 尹晓慧, 韩四海, 等. 便捷式近红外光谱仪研究进展[J]. 河南农业大学学报, 2019, 53(4): 662-670.
	LIU J X, YIN X H, HAN S Het al. Review of portable near-infrared spectrometers[J]. Journal of Henan agricultural university. 2019, 53(4): 662-670.
20	霍学松, 陈瀑, 戴嘉伟, 等. 微小型近红外光谱仪的应用进展与展望[J]. 分析测试学报, 2022, 41(9): 1301-1313.
	HUO X S, CHEN P, DAI J W, et al. Progress and prospect of application of miniatured near infrared spectrometers[J]. Journal of instrumental analysis, 2022, 41(9): 1301-1313.

预处理方法	描述	参数设置	预期效果
标准差标准化	将数据转换为均值为0，标准差为1的形式	无特定参数	消除量纲影响，使不同特征具有可比性
标准正态变换	转换数据以符合标准正态分布	无特定参数	改善数据分布，使其更接近正态分布
均值中心化	从每个数据点中减去整体均值	无特定参数	消除数据的长期趋势或基线漂移
一阶导数	计算数据的一阶导数	前向差分	强调光谱特征的变化，减少基线干扰
Savitzky-Golay平滑	通过局部多项式拟合来平滑数据	平滑窗口大小	减少随机噪声，保留信号的基本形状和特征
多元散射校正	校正由散射引起的光谱变异	无特定参数	减少或消除光谱变异，提高不同样本间的可比性

模型	VNIAPD（2分类）	VNIAPD（3分类）	SINO2040（2分类）	SINO2040（3分类）
KNN	Neighbors = 3e-0	Neighbors = 3e-0	Neighbors = 5e-0	Neighbors = 5e-0
SVM	C = 1e-0 Gamma = 1.19e-0	C = 1e-0 Gamma = 1.05e-0	C = 1.25e-0 Gamma = 1.15e-0	C = 1.25e-0 Gamma = 1.08e-0
RF	n = 2e+1， features = 5e-0 depth = 1.6e+1	n = 1.5e-1， features = 3e-0 depth = 2e+1	n = 2e+1， features = 5e-0 depth = 1.8e+1	n = 1e+1， features = 5e-0 depth = 2e+1
Naïve-Bayes	Gaussian	Gaussian	Gaussian	Gaussian
BPNN	learning-rate = 1e-4 epoch = 3e+2	learning-rate = 1e-4 epoch = 4e+2	learning-rate = 1e-4 epoch = 5e+2	learning-rate = 1e-3 epoch = 6e+2
DNN	learning-rate = 1e-5 epoch = 8e+1	learning-rate = 1e-5 epoch = 1e+2	learning-rate = 1e--4 epoch = 1e+2	learning-rate = 1e-4 epoch = 2e+2

预处理方法	MSC-1ST	SDN-1ST	1ST	MC-1ST	SG-1ST
KNN	93.18	93.18	90.90	88.63	90.90
SVM	90.90	90.90	86.36	93.18	90.90
RF	86.36	93.18	95.45	97.72	93.18
Naïve-Bayes	75.00	77.27	95.45	84.09	86.36
BPNN	97.72	100	97.72	84.09	86.36
DNN	97.72	93.18	97.70	97.70	100.00

预处理方法	MSC-1ST	SDN-1ST	1ST	MC-1ST	SG-1ST
KNN	100.00	90.90	86.36	95.45	86.36
SVM	90.90	95.45	93.18	97.72	93.18
RF	93.18	90.90	86.36	100.00	90.90
Naïve-Bayes	90.90	86.36	97.72	88.63	93.18
BPNN	97.72	100.00	95.45	93.18	93.18
DNN	95.54	97.72	93.18	97.72	100.00

[1]	叶大鹏, 陈晨, 李慧琳, 雷莹晓, 翁海勇, 瞿芳芳. 基于深度卷积生成式对抗网络的菌草丙二醛含量可见/近红外光谱反演[J]. 智慧农业(中英文), 2023, 5(3): 132-141.
[2]	龙佳宁, 张昭, 刘晓航, 李云霞, 芮照钰, 余江帆, 张漫, FLORES Paulo, 韩哲雄, 胡灿, 王旭峰. 利用改进EfficientNetV2和无人机图像检测小麦倒伏类型[J]. 智慧农业(中英文), 2023, 5(3): 62-74.
[3]	王敬湧, 张明珍, 凌华荣, 王梓廷, 盖倞尧. 干旱胁迫下玉米叶片叶绿素含量与含水量高光谱成像反演方法[J]. 智慧农业(中英文), 2023, 5(3): 142-153.
[4]	李嘉豪, 瞿宏俊, 高名喆, 仝德之, 郭亚. 基于PADC-PCNN与平稳小波变换多焦距绿色植株图像融合算法[J]. 智慧农业(中英文), 2023, 5(3): 121-131.
[5]	石杰锋, 黄为, 范协洋, 李修华, 卢阳旭, 蒋柱辉, 王泽平, 罗维, 张木清. 基于多种机器学习算法预测广西蔗区甘蔗产量[J]. 智慧农业(中英文), 2023, 5(2): 82-92.
[6]	潘晨露, 张正华, 桂文豪, 马家俊, 严晨曦, 张晓敏. 融合ECA机制与DenseNet201的水稻病虫害识别方法[J]. 智慧农业(中英文), 2023, 5(2): 45-55.
[7]	左敏, 胡天宇, 董微, 张可心, 张青川. 基于Informer神经网络的农产品物流需求预测分析——以华中地区为例[J]. 智慧农业(中英文), 2023, 5(1): 34-43.
[8]	付虹雨, 王薇, 廖澳, 岳云开, 许明志, 王梓薇, 陈建福, 佘玮, 崔国贤. 基于无人机遥感表型监测的苎麻优质种质资源筛选方法[J]. 智慧农业(中英文), 2022, 4(4): 74-83.
[9]	范承志, 王梓文, 杨兴超, 罗永开, 徐学欣, 郭斌, 李振海. 基于地物高光谱和无人机多光谱的黄河三角洲土壤盐分机器学习反演模型[J]. 智慧农业(中英文), 2022, 4(4): 61-73.
[10]	张志博, 赵西宁, 高晓东, 张利, 杨孟豪. 基于改进Linknet网络的黄土高原苹果园精准提取[J]. 智慧农业(中英文), 2022, 4(3): 95-107.
[11]	李阳, 彭彦昆, 吕德才, 李永玉, 刘乐, 朱宇杰. 可移动式苹果内部品质果园产地分级系统[J]. 智慧农业(中英文), 2022, 4(3): 132-142.
[12]	庄家煜, 许世卫, 李杨, 熊露, 刘克宝, 钟志平. 基于深度学习的多种农产品供需预测模型[J]. 智慧农业(中英文), 2022, 4(2): 174-182.
[13]	周巧黎, 马丽, 曹丽英, 于合龙. 基于改进轻量级卷积神经网络MobileNetV3的番茄叶片病害识别[J]. 智慧农业(中英文), 2022, 4(1): 47-56.
[14]	李少波, 杨玲, 于辉辉, 陈英义. 水下鱼类品种识别模型与实时识别系统[J]. 智慧农业(中英文), 2022, 4(1): 130-139.
[15]	邵明月, 张建华, 冯全, 柴秀娟, 张凝, 张文蓉. 深度学习在植物叶部病害检测与识别的研究进展[J]. 智慧农业(中英文), 2022, 4(1): 29-46.