欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2025, Vol. 7 ›› Issue (1): 20-32.doi: 10.12133/j.smartag.SA202410025

• 专题--农业知识智能服务和智慧无人农场(下) • 上一篇    下一篇

基于精准知识筛选及知识协同生成的农业大语言模型

姜京池1,2, 闫莲1, 刘劼1,2()   

  1. 1. 哈尔滨工业大学 计算学部,黑龙江 哈尔滨 150001,中国
    2. 智慧农场技术与系统全国重点实验室,黑龙江 哈尔滨 150001,中国
  • 收稿日期:2024-10-20 出版日期:2025-01-30
  • 基金项目:
    国家重点研发计划项目(ZDYF20220008); 黑龙江省科技计划项目(GJLX20240004)
  • 作者简介:
    姜京池,博士,副教授,研究方向为智慧农业、大语言模型。E-mail:
  • 通信作者:
    刘 劼,博士,教授,研究方向为物联网、人工智能。E-mail:

Agricultural Large Language Model Based on Precise Knowledge Retrieval and Knowledge Collaborative Generation

JIANG Jingchi1,2, YAN Lian1, LIU Jie1,2()   

  1. 1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
    2. National Key Laboratory of Smart Farm Technologies and Systems, Harbin 150001, China
  • Received:2024-10-20 Online:2025-01-30
  • Foundation items:National Key Research and Development Program of China(ZDYF20220008); Heilongjiang Provincial Science and Technology Program Project(GJLX20240004)
  • About author:

    JIANG Jingchi, E-mail:

  • Corresponding author:
    LIU Jie, E-mail:

摘要:

【目的/意义】 大语言模型(Large Language Models, LLMs)依托其强大的认知理解和内容生成能力,发展迅速,有望成为智慧农业领域一种全新的研究范式。然而,由于通用LLMs缺乏农业领域知识,对于专业性问题通常会产生事实性错误或信息不完备的回复。为提升大模型在农业领域的适应性,本研究提出了一种知识图谱引导的农业LLMs——KGLLM。 【方法】 该模型基于信息熵实现知识过滤,并在解码阶段显式利用知识图谱的语义信息约束其内容生成。具体而言,将输入问题中的关键实体链接到农业知识图谱,形成知识推理路径和问答依据。为保证此外源知识的有效性,进一步评估引入每条知识前后模型输出内容的熵差,对无法提升答案确定性的知识进行过滤。经筛选的知识路径将被用于调整词表概率,以增加与知识高度相关词的输出,实现知识图谱对LLMs的显式引导。 【结果和讨论】 本研究在5种主流的通用LLMs上实现了农业知识图谱引导技术,包括Baichuan、ChatGLM、Qwen等开源大模型,同时与最优的知识图谱检索增强生成技术进行了对比。实验结果表明,本研究提出的方法在内容流畅性、准确性、真实性和领域忠诚度方面都有显著提升,相较于GPT-4o,在Mean BLEU、ROUGE、BertScore上分别平均提升了2.592 3、2.815 1和9.84%。通过消融实验亦证明了知识引导的农业LLMs不仅实现了冗余知识过滤,而且在解码过程中可有效调整词表输出分布,有助于提升通用LLMs在农业领域的适应性及问答的可解释性。 【结论】 本研究为后续农业LLMs的构建提供了可借鉴思路,表明知识图谱引导的方法在提升模型的领域适应性和回答质量具有潜在的应用价值。

关键词: 知识图谱, 农业大语言模型, 信息熵, 语义相似度, 知识引导

Abstract:

[Objective] The rapid advancement of large language models (LLMs) has positioned them as a promising novel research paradigm in smart agriculture, leveraging their robust cognitive understanding and content generative capabilities. However, due to the lack of domain-specific agricultural knowledge, general LLMs often exhibit factual errors or incomplete information when addressing specialized queries, which is particularly prominent in agricultural applications. Therefore, enhancing the adaptability and response quality of LLMs in agricultural applications has become an important research direction. [Methods] To improve the adaptability and precision of LLMs in the agricultural applications, an innovative approach named the knowledge graph-guided agricultural LLM (KGLLM) was proposed. This method integrated information entropy for effective knowledge filtering and applied explicit constraints on content generation during the decoding phase by utilizing semantic information derived from an agricultural knowledge graph. The process began by identifying and linking key entities from input questions to the agricultural knowledge graph, which facilitated the formation of knowledge inference paths and the development of question-answering rationales. A critical aspect of this approach was ensuring the validity and reliability of the external knowledge incorporated into the model. This was achieved by evaluating the entropy difference in the model's outputs before and after the introduction of each piece of knowledge. Knowledge that didn't enhance the certainty of the answers was systematically filtered out. The knowledge paths that pass this entropy evaluation were used to adjust the token prediction probabilities, prioritizing outputs that were closely aligned with the structured knowledge. This allowed the knowledge graph to exert explicit guidance over the LLM's outputs, ensuring higher accuracy and relevance in agricultural applications. [Results and Discussions] The proposed knowledge graph-guided technique was implemented on five mainstream general-purpose LLMs, including open-source models such as Baichuan, ChatGLM, and Qwen. These models were compared with state-of-the-art knowledge graph-augmented generation methods to evaluate the effectiveness of the proposed approach. The results demonstrate that the proposed knowledge graph-guided approach significantly improved several key performance metrics of fluency, accuracy, factual correctness, and domain relevance. Compared to GPT-4o, the proposed method achieved notable improvements by an average of 2.592 3 in Mean BLEU, 2.815 1 in ROUGE, and 9.84% in BertScore. These improvements collectively signify that the proposed approach effectively leverages agricultural domain knowledge to refine the outputs of general-purpose LLMs, making them more suitable for agricultural applications. Ablation experiments further validated that the knowledge-guided agricultural LLM not only filtered out redundant knowledge but also effectively adjusts token prediction distributions during the decoding phase. This enhanced the adaptability of general-purpose LLMs in agriculture contexts and significantly improves the interpretability of their responses. The knowledge filtering and knowledge graph-guided model decoding method proposed in this study, which was based on information entropy, effectively identifies and selects knowledge that carried more informational content through the comparison of information entropy.Compared to existing technologies in the agricultural field, this method significantly reduced the likelihood of "hallucination" phenomena during the generation process. Furthermore, the guidance of the knowledge graph ensured that the model's generated responses were closely related to professional agricultural knowledge, thereby avoiding vague and inaccurate responses generated from general knowledge. For instance, in the application of pest and disease control, the model could accurately identify the types of crop diseases and corresponding control measures based on the guided knowledge path, thereby providing more reliable decision support. [Conclusions] This study provides a valuable reference for the construction of future agricultural large language models, indicating that the knowledge graphs guided mehtod has the potential to enhance the domain adaptability and answer quality of models. Future research can further explore the application of similar knowledge-guided strategies in other vertical fields to enhance the adaptability and practicality of LLMs across various professional domains.

Key words: knowledge graph, agricultural large language model, information entropy, semantic similarity, knowledge guidance

中图分类号: