欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

自适应混合检索增强大模型的农作物病虫害智能问答方法

杨俊1, 杨婉霞1(), 杨森1, 何亮2,3, 张娣1   

  1. 1. 甘肃农业大学机电工程学院,甘肃 兰州 730070,中国
    2. 新疆大学 计算机科学与技术学院,新疆 乌鲁木齐 830046,中国
    3. 清华大学电子工程系北京信息科学与技术国家研究中心,北京 100084,中国
  • 收稿日期:2025-06-13 出版日期:2025-10-09
  • 基金项目:
    新一代人工智能国家科技重大专项(2022ZD0115801)
  • 作者简介:

    杨 俊,硕士研究生,研究方向为自然语言处理。E-mail:

  • 通信作者:
    杨婉霞,博士,教授,研究方向为农业信息化。E-mail:

Intelligent Q&A Method for Crop Pests and Diseases Using LLM Augmented by Adaptive Hybrid Retrieval

YANG Jun1, YANG Wanxia1(), YANG Sen1, HE Liang2,3, ZHANG Di1   

  1. 1. College of Electrical and Mechanical Engineering, Gansu Agricultural University, Lanzhou 730070, China
    2. School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
    3. Beijing National Research Center for Information Science and Technology/ Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
  • Received:2025-06-13 Online:2025-10-09
  • Foundation items:The National Key R&D Program of China(2022ZD0115801)
  • About author:

    YANG Jun, E-mail:

  • Corresponding author:
    YANG Wanxia, E-mail:

摘要:

【目的/意义】 为充分发挥隐含在农业大数据中的分散、异构和无关联农业知识的潜在应用价值,通过构建知识库,结合检索技术用于增强大模型输出专业的农业知识,为促进农业知识快速服务于生产实践提供有效手段。 【方法】 提出了检索增强大模型的农作物病虫害智能问答方法,该方法通过自建知识库并协同优化分块策略、自适应检索机制与结构化提示工程,实现了农业病虫害领域知识有效增强大模型的精准专业问答。具体提出了自适应混合检索增强生成方法(Adaptive Hybrid Retrieval -Retrieval-Augmented Generation, AHR-RAG),首先在固定长度分块时引入重叠机制缓解语义割裂,同时,采用向量语义相似度匹配与主题高度相关的文本分块加以存储。依据问题复杂度设计了动态路由的单跳(BM25算法)检索与多跳检索。然后将文本方法与多种基线方法在不同查询类型和不同复杂度查询等多方面进行了对比实验。[结果与讨论]本研究方法在Qwen1.5-7B-Chat模型上的效果最佳,其准确率达到89.6%对单跳与多跳查询的准确率分别达到0.921和0.748,较Self-RAG与Adaptive-RAG多跳查询的准确率分别提升0.082和0.059,说明本研究方法能更好地推理多跳等复杂查询。 【结论】 本研究方法在生成答案的准确性、相关性和全面性方面的显著优势。未来的工作将探索融合多模态知识库。

关键词: 自适应混合检索, 文本分块, 病虫害, 智能问答

Abstract:

[Objective] Extracting valuable knowledge from vast amounts of dispersed, heterogeneous, and unstructured agricultural big data, correlating and structuring it, and enhancing large models to form intelligent question-answering systems enables the effective delivery of services to all in agriculture. This approach can rapidly advance the scientific and precision-based development of agricultural production. Existing agricultural Q&A systems lack enough semantic understanding of complex symptoms, while general-purpose large language models (LLM) produce factual hallucinations due to incomplete training data coverage. It aims to address the issues of insufficient scale and low quality in the construction of knowledge bases in the agricultural field [Methods] First pest and disease data were collected along for five typical crops: wheat, rice, corn, potatoes, and cotton. Using manual verification methods, outliers were precisely identified and removed, ultimately yielding 87 901 unstructured data entries. Then, a few-shot learning model was employed to extract entities defined in the pattern layer, and these entities were aligned with the semantic vectors of Bert and LLM prompt engineering, ultimately yielding a triplet knowledge base of 916 239 entries for knowledge retrieval. A knowledge retrieval-augmented LLM approach for intelligent Q&A on crop pests and diseases was proposed, specifically the adaptive hybrid retrieval-augmented generation (AHR-RAG) approach. Firstly, an overlapping mechanism was introduced during fixed-length segmentation to mitigate semantic fragmentation. Simultaneously, vector semantic similarity was used to match highly related text blocks to the topic for optimization and storage. Then, single-hop and multi-hop retrieval were designed based on the complexity of the problem. Single-hop retrieval used the BM25 algorithm to match information extracted from the query with document content in the Elasticsearch index, feeding the results into the LLM to enhance answer generation. Multi-hop retrieval first converted user queries into structured conditions and semantic vector representations. Results retrieved from different knowledge bases were then fused using reciprocal rank fusion (RRF) and fed into the LLM. [Results and Discussions] The proposed method was experimentally compared with multiple baseline approaches, including different query types and complexity queries. The results demonstrated that the proposed method achieved accuracy and F1 improvements of 0.193 and 0.170, respectively, on the Qwen1.5-7B-Chat model. Compared to the improved methods Self-RAG and Adaptive-RAG, AHR-RAG maintained low response times while achieving F1 improvements of 0.05 and 0.021, respectively, with an accuracy as high as 0.896. For multi-type question-answering tasks, compared to the Naive-RAG method that relied solely on prior knowledge, our AHR-RAG approach achieved accuracy improvements of 0.231, 0.123, and 0.157 for comparison, judgment and selection query types, respectively. For parsing complex semantics, AHR-RAG also demonstrated significant advantages. In single-hop queries, its accuracy reached 0.921, representing a 0.29 improvement over Adaptive-RAG. In multi-hop query scenarios, its accuracy reached at 0.748, achieving gains of 0.082 and 0.059 over Self-RAG and Adaptive-RAG respectively. In retrieval-augmented generation, AHR-RAG achieved a 0.013 increase in accuracy and a 0.09 improvement in F1 by optimizing prompt strategies, compared to directly feeding retrieval results to the model's output. [Conclusions] This research methodology demonstrates strong adaptability to diverse query types and excels at reasoning complex queries such as multi-hop searches. It delivers significant advantages in answer generation accuracy, relevance, and comprehensiveness, producing responses with enhanced logical coherence and richer content. Future work will explore integrating multimodal knowledge bases.

Key words: adaptive hybrid retrieval, text blocking, pests and diseases, intelligent Q&A

中图分类号: