Welcome to Smart Agriculture 中文

Smart Agriculture

   

ReG-RAG: A Large Language Model-based Question Answering Framework with Query Rewriting and Knowledge Graph Enhancement

LI Xiaoyu, ZHANG Jiayi, ZHANG Haitao, NIE Xiaoyi()   

  1. College of Information and Intelligence, Hunan Agricultural University, Changsha Hunan 410128, China
  • Received:2025-07-04 Online:2025-10-16
  • Foundation items:Hainan Provincial Sanya Yazhou Bay Science and Technology Innovation Joint Project(ZDYF2025GXJS154)
  • About author:

    LI Xiaoyu, E-mail:

  • corresponding author:
    NIE Xiaoyi, E-mail:

Abstract:

[Objective] With the rapid advancement of large language models (LLMs), intelligent question-answering (QA) systems have attracted widespread attention in specialized fields such as agriculture, medicine, and finance. However, existing systems often struggle with queries containing numerous technical terms and complex semantic expressions, frequently leading to unclear semantic comprehension, insufficient retrieval coverage, and low answer accuracy, thereby limiting their practical application. To address these challenges, a retrieval-augmented generation approach named ReG-RAG (Rewrite and Graph-enhanced Retrieval-Augmented Generation) is proposed, which integrates query rewriting and knowledge-graph enhancement to improve the accuracy and interpretability of complex QA tasks by optimizing query semantics and incorporating structured knowledge. Rapeseed cultivation was selected as a case study to demonstrate the framework's effectiveness and generalizability in a specialized domain. [Methods] The ReG-RAG framework comprised three hierarchical layers: query rewriting, dual-channel knowledge retrieval, and knowledge-aware generation. In the query rewriting layer, the system utilized the T5 model to semantically normalize the original query using a predefined set of prompts designed to standardize terminology and clarify user intent. The instruction set, covering five categories of rapeseed knowledge (varieties, breeding, cultivation management, pest and disease control, and nutrient regulation), was constructed from domain literature and expert contributions to enhance query clarity and matching accuracy. In the retrieval layer, the system performed text-vector retrieval and knowledge-graph subgraph retrieval in parallel. The text channel retrieved the top-k (k = 3) most relevant documents via cosine similarity, while the knowledge-graph channel extracted entities and relations using an LLM and, together with a structured metadata annotation system for graph construction, obtained relevant neighborhood subgraphs. The study adopted LightRAG's document segmentation and entity extraction methods to ensure knowledge integrity and traceability. In the generation layer, an attention-routing mechanism was designed to integrate textual and graph-based retrieval results into a multi-source knowledge base. The generation model then produced the final answers, with dynamic weight allocation emphasizing the most relevant knowledge nodes to ensure contextual coherence and factual accuracy. [Results and Discussions] Experiments were conducted on a rapeseed knowledge base (319 QA pairs) and the public WikiEval dataset, with comparative methods including RAG-Fusion, Decomposition, Step Back, HyDE, and Multi-Query. The results demonstrated that ReG-RAG outperformed all baseline methods across multiple evaluation metrics. On the rapeseed dataset, ReG-RAG achieved context precision of 0.904 2, context recall of 0.842 1, faithfulness of 0.988 6, and answer relevance of 0.986 9, significantly exceeding existing approaches. On the WikiEval dataset, it attained corresponding scores of 0.862 0, 0.838 7, 0.969 4, and 0.942 1, reflecting improvements of approximately 2.6%–4.7% over the best-performing baseline methods. Case analyses revealed that ReG-RAG produced logically consistent and complete answers by integrating graph-based reasoning for pest and disease control queries, whereas traditional methods often provided fragmented information. In variety-improvement tasks, responses from RAG-Fusion and LightRAG suffered from information gaps, while ReG-RAG, aided by query rewriting and knowledge Fusion, generated more professional and coherent answers. These findings indicated that ReG-RAG offered notable advantages in resolving query ambiguity, improving knowledge coverage, and enhancing answer faithfulness. [Conclusions] The proposed ReG-RAG framework effectively addressed semantic ambiguity, inadequate retrieval, and generation inaccuracies in domain-specific QA by integrating query rewriting and knowledge-graph enhancement. Experimental validation demonstrated that the method significantly outperformed mainstream baseline approaches on both specialized and general evaluation datasets, exhibiting strong adaptability and generalizability. Limitations of this study included the relatively small data scale and the need for further optimization in inference efficiency and deployment costs. Future work will extend the framework to additional crops such as wheat and rice, expand knowledge coverage, and investigate model compression and edge-deployment strategies to enhance practical applicability. Further optimization of the query-rewriting module's architecture and training procedure will also be pursued to strengthen robustness and generalization in complex semantic scenarios.

Key words: large language models, query rewriting, knowledge graphs, retrieval-augmented generation, question answering systems

CLC Number: