Welcome to Smart Agriculture 中文

Smart Agriculture

   

Online Detection System for Freshness of Fruits and Vegetables Based on Temporal Multi-source Information Fusion

HUANG Xianguo, ZHU Qibing(), HUANG Min   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China
  • Received:2025-05-30 Online:2025-10-21
  • Foundation items:National Key R&D Program of China(2022YFD2100601)
  • About author:

    HUANG Xianguo, E-mail:

  • corresponding author:
    ZHU Qibing, E-mail:

Abstract:

[Objective] Real-time and accurate quality monitoring of fruits and vegetables during cold chain logistics is of great importance for ensuring supply chain quality and reducing economic losses. However, traditional detection methods generally suffer from several core deficiencies, such as being offline, relying on unimodal information, and being unable to capture dynamic evolution. To overcome these challenges, an online freshness detection system is proposed and implemented for fruits and vegetables based on temporal multi-source information fusion. This system was designed to achieve precise online detection of fruit and vegetable freshness, providing an effective technical solution for the refined management and early spoilage warning within the cold chain supply chain, thereby significantly reducing economic losses. [Methods] A complete system was constructed composed of a lower-computer data acquisition node, an IoT cloud platform, and an upper-computer Qt client. The lower-computer synchronously collected environmental temporal sensing data (temperature, humidity, CO2, ethylene) and visual temporal images of indicator tags via a self-designed portable acquisition node. A novel co-attention-based convolutional recurrent network (Co-ACRN) deep learning model was proposed for deeply mining the complex correlations between the two heterogeneous time-series data streams. This model innovatively employed a "co-attention + self-attention" dual mechanism. Firstly, in the early fusion stage, a co-attention module intelligently aligned and deeply integrated visual and sensor feature sequences by constructing a cross-modal affinity matrix. Subsequently, the fused sequence was fed into a long short-term memory (LSTM) network to encode temporal cumulative effects. Finally, a self-attention module performed a global contextual review on the LSTM output to capture long-range temporal dependencies. In the specific implementation, visual features were extracted by a lightweight convolutional neural network (CNN) with two convolutional-pooling layers; the co-attention calculated weights by generating context-aware intermediate features; and the self-attention adopted the standard scaled dot-product attention mechanism. For application deployment, the model was efficiently deployed to the QT client in the open neural network exchange (ONNX) format, achieving real-time, edge-side inference. [Results and Discussions] Experimental results showed that the proposed Co-ACRN model achieved an overall accuracy of 96.93% on the test set in the three-class mango freshness detection task, with its performance significantly surpassing that of various mainstream baselines and advanced temporal multimodal fusion models, such as modality-invariant and specific-representations for multimodal sentiment analysis (MISA), recurrent attended variation embedding network (RAVEN), multimodal transformer (MulT), and heterogeneous hierarchical message passing network (HHMPN). To verify the rationale of the model design, two sets of ablation experiments were conducted. The input-based ablation study decisively proved that the combination of "time-series information + multimodal information" is a necessary prerequisite for accurate detection, as any model relying on unimodal or static information exhibited significant performance bottlenecks. The architecture-based ablation study further confirmed the superiority of the proposed "dual-attention" system; compared to a backbone network without any attention mechanism, its accuracy was improved by more than five percentage points, and the recall rate for the critical "spoiled" category was as high as 99.16%. An in-depth analysis of the confusion matrix revealed that the vast majority of the model's errors occurred between adjacent categories with the most similar physical states, with no serious cross-category misclassifications, demonstrating its strong robustness. After deployment on the client side, the system's single diagnosis time was less than 2 s, verifying the solution's combination of high accuracy and real-time performance. [Conclusions] The developed online detection system and Co-ACRN model successfully enabled the real-time, accurate, and non-destructive intelligent detection of fruit and vegetable freshness. The research findings indicate that by combining advanced co-attention and self-attention mechanisms, the fusion challenges of complex multimodal temporal data can be effectively solved. In summary, this study provides a complete solution that combines theoretical innovation with engineering practicality for the online and intelligent detection of distributed fruit and vegetable freshness, and paves new paths for the development of this field in both theory and practice.

Key words: fruits and vegetables, cold chain monitoring, temporal multimodal fusion, dual attention mechanism, online non-destructive testing

CLC Number: