Welcome to Smart Agriculture 中文

Smart Agriculture

   

Intelligent Q&A Method for Crop Pests and Diseases Using LLM Augmented by Adaptive Hybrid Retrieval

YANG Jun1, YANG Wanxia1(), YANG Sen1, HE Liang2,3, ZHANG Di1   

  1. 1. College of Electrical and Mechanical Engineering, Gansu Agricultural University, Lanzhou 730070, China
    2. School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
    3. Beijing National Research Center for Information Science and Technology/ Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • Received:2025-06-13 Online:2025-10-09
  • Foundation items:The National Key R&D Program of China(2022ZD0115801)
  • About author:

    YANG Jun, E-mail:

  • corresponding author:
    YANG Wanxia, E-mail:

Abstract:

[Objective] Extracting valuable knowledge from vast amounts of dispersed, heterogeneous, and unstructured agricultural big data, correlating and structuring it, and enhancing large models to form intelligent question-answering systems enables the effective delivery of services to all in agriculture. This approach can rapidly advance the scientific and precision-based development of agricultural production. Existing agricultural Q&A systems lack enough semantic understanding of complex symptoms, while general-purpose large language models (LLM) produce factual hallucinations due to incomplete training data coverage. It aims to address the issues of insufficient scale and low quality in the construction of knowledge bases in the agricultural field [Methods] First pest and disease data were collected along for five typical crops: wheat, rice, corn, potatoes, and cotton. Using manual verification methods, outliers were precisely identified and removed, ultimately yielding 87 901 unstructured data entries. Then, a few-shot learning model was employed to extract entities defined in the pattern layer, and these entities were aligned with the semantic vectors of Bert and LLM prompt engineering, ultimately yielding a triplet knowledge base of 916 239 entries for knowledge retrieval. A knowledge retrieval-augmented LLM approach for intelligent Q&A on crop pests and diseases was proposed, specifically the adaptive hybrid retrieval-augmented generation (AHR-RAG) approach. Firstly, an overlapping mechanism was introduced during fixed-length segmentation to mitigate semantic fragmentation. Simultaneously, vector semantic similarity was used to match highly related text blocks to the topic for optimization and storage. Then, single-hop and multi-hop retrieval were designed based on the complexity of the problem. Single-hop retrieval used the BM25 algorithm to match information extracted from the query with document content in the Elasticsearch index, feeding the results into the LLM to enhance answer generation. Multi-hop retrieval first converted user queries into structured conditions and semantic vector representations. Results retrieved from different knowledge bases were then fused using reciprocal rank fusion (RRF) and fed into the LLM. [Results and Discussions] The proposed method was experimentally compared with multiple baseline approaches, including different query types and complexity queries. The results demonstrated that the proposed method achieved accuracy and F1 improvements of 0.193 and 0.170, respectively, on the Qwen1.5-7B-Chat model. Compared to the improved methods Self-RAG and Adaptive-RAG, AHR-RAG maintained low response times while achieving F1 improvements of 0.05 and 0.021, respectively, with an accuracy as high as 0.896. For multi-type question-answering tasks, compared to the Naive-RAG method that relied solely on prior knowledge, our AHR-RAG approach achieved accuracy improvements of 0.231, 0.123, and 0.157 for comparison, judgment and selection query types, respectively. For parsing complex semantics, AHR-RAG also demonstrated significant advantages. In single-hop queries, its accuracy reached 0.921, representing a 0.29 improvement over Adaptive-RAG. In multi-hop query scenarios, its accuracy reached at 0.748, achieving gains of 0.082 and 0.059 over Self-RAG and Adaptive-RAG respectively. In retrieval-augmented generation, AHR-RAG achieved a 0.013 increase in accuracy and a 0.09 improvement in F1 by optimizing prompt strategies, compared to directly feeding retrieval results to the model's output. [Conclusions] This research methodology demonstrates strong adaptability to diverse query types and excels at reasoning complex queries such as multi-hop searches. It delivers significant advantages in answer generation accuracy, relevance, and comprehensiveness, producing responses with enhanced logical coherence and richer content. Future work will explore integrating multimodal knowledge bases.

Key words: adaptive hybrid retrieval, text blocking, pests and diseases, intelligent Q&A

CLC Number: