欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

ReG-RAG:融合查询重写与图谱增强的大模型问答生成方法

李晓雨, 张佳仪, 张海涛, 聂笑一()   

  1. 湖南农业大学 信息与智能科学技术学院,湖南 长沙 410128,中国
  • 收稿日期:2025-07-04 出版日期:2025-10-16
  • 基金项目:
    海南省重点研发三亚崖州湾科技城科技创新联合项目(ZDYF2025GXJS154)
  • 作者简介:

    李晓雨,硕士研究生,研究方向为农业大语言模型、知识图谱。E-mail:

  • 通信作者:
    聂笑一,博士,副教授,研究方向为人工智能、智慧农业。E-mail:

ReG-RAG: A Large Language Model-based Question Answering Framework with Query Rewriting and Knowledge Graph Enhancement

LI Xiaoyu, ZHANG Jiayi, ZHANG Haitao, NIE Xiaoyi()   

  1. College of Information and Intelligence, Hunan Agricultural University, Changsha Hunan 410128, China
  • Received:2025-07-04 Online:2025-10-16
  • Foundation items:Hainan Provincial Sanya Yazhou Bay Science and Technology Innovation Joint Project(ZDYF2025GXJS154)
  • About author:

    LI Xiaoyu, E-mail:

  • Corresponding author:
    NIE Xiaoyi, E-mail:

摘要:

【目的/意义】 构建高效、可扩展的领域问答系统是当前大语言模型(Large Language Models, LLMs)应用研究中的关键方向。然而,现有问答系统在处理专业领域问题时,常面临查询语义模糊、知识覆盖不足与生成回答不准确等挑战。 【方法】 提出一种融合查询重写与知识图谱增强机制的检索增强生成方法——ReG-RAG(Rewrite and Graph-Enhanced Retrieval-Augmented Generation),作为面向复杂问答任务的通用解决方案。该方法以大语言模型为核心,通过伪文档生成与语义重写提升查询表达清晰度与检索准确性;结合三元组抽取与构建结构化知识图谱,增强语义关联与推理能力;最终通过双通道知识检索与注意力路由生成模块,实现多源知识的整合与高质量回答。实验在油菜领域数据集与WikiEval公共数据集上进行。 【结果和讨论】 ReG-RAG在油菜数据集上的上下文精度、召回率、答案忠实度与相关性指标分别达到0.904 2、0.842 1、0.988 6和0.986 9,显著优于现有方法;在WikiEval数据集上,ReG-RAG的4项指标分别为0.862 0、0.838 7、0.969 4和0.942 1,相较最佳基线方法提升幅度为2.6%~4.7%。 【结论】 上述结果验证了ReG-RAG在垂直领域和通用场景下均具备优越性能和良好泛化能力,为构建高效可扩展的智能问答系统提供了新路径。

关键词: 大语言模型, 查询重写, 知识图谱, 检索增强生成, 问答系统

Abstract:

[Objective] With the rapid advancement of large language models (LLMs), intelligent question-answering (QA) systems have attracted widespread attention in specialized fields such as agriculture, medicine, and finance. However, existing systems often struggle with queries containing numerous technical terms and complex semantic expressions, frequently leading to unclear semantic comprehension, insufficient retrieval coverage, and low answer accuracy, thereby limiting their practical application. To address these challenges, a retrieval-augmented generation approach named ReG-RAG (Rewrite and Graph-enhanced Retrieval-Augmented Generation) is proposed, which integrates query rewriting and knowledge-graph enhancement to improve the accuracy and interpretability of complex QA tasks by optimizing query semantics and incorporating structured knowledge. Rapeseed cultivation was selected as a case study to demonstrate the framework's effectiveness and generalizability in a specialized domain. [Methods] The ReG-RAG framework comprised three hierarchical layers: query rewriting, dual-channel knowledge retrieval, and knowledge-aware generation. In the query rewriting layer, the system utilized the T5 model to semantically normalize the original query using a predefined set of prompts designed to standardize terminology and clarify user intent. The instruction set, covering five categories of rapeseed knowledge (varieties, breeding, cultivation management, pest and disease control, and nutrient regulation), was constructed from domain literature and expert contributions to enhance query clarity and matching accuracy. In the retrieval layer, the system performed text-vector retrieval and knowledge-graph subgraph retrieval in parallel. The text channel retrieved the top-k (k = 3) most relevant documents via cosine similarity, while the knowledge-graph channel extracted entities and relations using an LLM and, together with a structured metadata annotation system for graph construction, obtained relevant neighborhood subgraphs. The study adopted LightRAG's document segmentation and entity extraction methods to ensure knowledge integrity and traceability. In the generation layer, an attention-routing mechanism was designed to integrate textual and graph-based retrieval results into a multi-source knowledge base. The generation model then produced the final answers, with dynamic weight allocation emphasizing the most relevant knowledge nodes to ensure contextual coherence and factual accuracy. [Results and Discussions] Experiments were conducted on a rapeseed knowledge base (319 QA pairs) and the public WikiEval dataset, with comparative methods including RAG-Fusion, Decomposition, Step Back, HyDE, and Multi-Query. The results demonstrated that ReG-RAG outperformed all baseline methods across multiple evaluation metrics. On the rapeseed dataset, ReG-RAG achieved context precision of 0.904 2, context recall of 0.842 1, faithfulness of 0.988 6, and answer relevance of 0.986 9, significantly exceeding existing approaches. On the WikiEval dataset, it attained corresponding scores of 0.862 0, 0.838 7, 0.969 4, and 0.942 1, reflecting improvements of approximately 2.6%–4.7% over the best-performing baseline methods. Case analyses revealed that ReG-RAG produced logically consistent and complete answers by integrating graph-based reasoning for pest and disease control queries, whereas traditional methods often provided fragmented information. In variety-improvement tasks, responses from RAG-Fusion and LightRAG suffered from information gaps, while ReG-RAG, aided by query rewriting and knowledge Fusion, generated more professional and coherent answers. These findings indicated that ReG-RAG offered notable advantages in resolving query ambiguity, improving knowledge coverage, and enhancing answer faithfulness. [Conclusions] The proposed ReG-RAG framework effectively addressed semantic ambiguity, inadequate retrieval, and generation inaccuracies in domain-specific QA by integrating query rewriting and knowledge-graph enhancement. Experimental validation demonstrated that the method significantly outperformed mainstream baseline approaches on both specialized and general evaluation datasets, exhibiting strong adaptability and generalizability. Limitations of this study included the relatively small data scale and the need for further optimization in inference efficiency and deployment costs. Future work will extend the framework to additional crops such as wheat and rice, expand knowledge coverage, and investigate model compression and edge-deployment strategies to enhance practical applicability. Further optimization of the query-rewriting module's architecture and training procedure will also be pursued to strengthen robustness and generalization in complex semantic scenarios.

Key words: large language models, query rewriting, knowledge graphs, retrieval-augmented generation, question answering systems

中图分类号: