Agricultural Large Language Model Based on Precise Knowledge Retrieval and Knowledge Collaborative Generation

doi:10.12133/j.smartag.SA202410025

Abstract

Abstract:

[Objective] The rapid advancement of large language models (LLMs) has positioned them as a promising novel research paradigm in smart agriculture, leveraging their robust cognitive understanding and content generative capabilities. However, due to the lack of domain-specific agricultural knowledge, general LLMs often exhibit factual errors or incomplete information when addressing specialized queries, which is particularly prominent in agricultural applications. Therefore, enhancing the adaptability and response quality of LLMs in agricultural applications has become an important research direction. [Methods] To improve the adaptability and precision of LLMs in the agricultural applications, an innovative approach named the knowledge graph-guided agricultural LLM (KGLLM) was proposed. This method integrated information entropy for effective knowledge filtering and applied explicit constraints on content generation during the decoding phase by utilizing semantic information derived from an agricultural knowledge graph. The process began by identifying and linking key entities from input questions to the agricultural knowledge graph, which facilitated the formation of knowledge inference paths and the development of question-answering rationales. A critical aspect of this approach was ensuring the validity and reliability of the external knowledge incorporated into the model. This was achieved by evaluating the entropy difference in the model's outputs before and after the introduction of each piece of knowledge. Knowledge that didn't enhance the certainty of the answers was systematically filtered out. The knowledge paths that pass this entropy evaluation were used to adjust the token prediction probabilities, prioritizing outputs that were closely aligned with the structured knowledge. This allowed the knowledge graph to exert explicit guidance over the LLM's outputs, ensuring higher accuracy and relevance in agricultural applications. [Results and Discussions] The proposed knowledge graph-guided technique was implemented on five mainstream general-purpose LLMs, including open-source models such as Baichuan, ChatGLM, and Qwen. These models were compared with state-of-the-art knowledge graph-augmented generation methods to evaluate the effectiveness of the proposed approach. The results demonstrate that the proposed knowledge graph-guided approach significantly improved several key performance metrics of fluency, accuracy, factual correctness, and domain relevance. Compared to GPT-4o, the proposed method achieved notable improvements by an average of 2.592 3 in Mean BLEU, 2.815 1 in ROUGE, and 9.84% in BertScore. These improvements collectively signify that the proposed approach effectively leverages agricultural domain knowledge to refine the outputs of general-purpose LLMs, making them more suitable for agricultural applications. Ablation experiments further validated that the knowledge-guided agricultural LLM not only filtered out redundant knowledge but also effectively adjusts token prediction distributions during the decoding phase. This enhanced the adaptability of general-purpose LLMs in agriculture contexts and significantly improves the interpretability of their responses. The knowledge filtering and knowledge graph-guided model decoding method proposed in this study, which was based on information entropy, effectively identifies and selects knowledge that carried more informational content through the comparison of information entropy.Compared to existing technologies in the agricultural field, this method significantly reduced the likelihood of "hallucination" phenomena during the generation process. Furthermore, the guidance of the knowledge graph ensured that the model's generated responses were closely related to professional agricultural knowledge, thereby avoiding vague and inaccurate responses generated from general knowledge. For instance, in the application of pest and disease control, the model could accurately identify the types of crop diseases and corresponding control measures based on the guided knowledge path, thereby providing more reliable decision support. [Conclusions] This study provides a valuable reference for the construction of future agricultural large language models, indicating that the knowledge graphs guided mehtod has the potential to enhance the domain adaptability and answer quality of models. Future research can further explore the application of similar knowledge-guided strategies in other vertical fields to enhance the adaptability and practicality of LLMs across various professional domains.

Key words: knowledge graph, agricultural large language model, information entropy, semantic similarity, knowledge guidance

CLC Number:

TP391.1

JIANG Jingchi, YAN Lian, LIU Jie. Agricultural Large Language Model Based on Precise Knowledge Retrieval and Knowledge Collaborative Generation[J]. Smart Agriculture, 2025, 7(1): 20-32.

Figures/Tables 16

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Table 1

Table 2

Table 3

Fig. 7

Table 4

Fig. 8

Table 5

Ablation results for knowledge filtering based on information entropy and explicit decoding constraints of knowledge graphs

Backbone	Model	GOOGLE BLEU	BLEU					ROUGE			BertScore/%
Backbone	Model	GOOGLE BLEU	BLEU_1	BLEU_2	BLEU_3	BLEU_4	Mean_BLEU	ROUGE_1	ROUGE_2	ROUGE_3	BertScore/%
Baichuan-7b	KGLLM（Ours）	1.918 0	3.482 2	2.255 1	1.703 9	1.304 8	2.186 5	2.572 6	0.271 0	2.572 6	64.53
	wo MutualI	1.680 3	3.557 7	2.158 0	1.570 5	1.181 4	2.116 9	2.158 8	0.121 1	2.158 8	62.23
	wo EConstraint	1.723 2	3.323 5	2.210 5	1.665 5	1.259 8	2.114 8	2.941 2	0.176 5	2.941 2	64.20
Baichuan-13b	KGLLM（Ours）	1.276 7	5.844 3	3.900 5	2.796 0	2.043 1	3.646 0	5.608 0	0.132 5	5.608 0	64.37
	wo MutualI	1.488 8	5.467 1	3.529 7	2.483 8	1.776 4	3.314 3	4.753 6	0.093 1	4.753 6	62.42
	wo EConstraint	1.835 7	5.556 7	3.643 6	2.620 7	1.933 2	3.438 5	5.219 8	0.239 6	5.219 8	63.95
ChatGLM3-6B	KGLLM（Ours）	2.608 6	4.105 5	2.386 5	1.716 7	1.281 6	2.372 6	2.166 8	0.151 8	2.166 8	64.03
	wo MutualI	2.480 6	3.211 9	1.967 7	1.484 3	1.155 4	1.954 8	1.996 9	0.209 9	1.996 9	62.65
	wo EConstraint	2.709 3	3.813 5	2.211 3	1.608 4	1.217 0	2.212 6	2.031 6	0.122 0	2.031 6	63.98
Qwen1.5-7B	KGLLM（Ours）	1.441 9	3.154 4	1.923 2	1.420 7	1.084 7	1.895 8	1.192 9	0.039 1	1.192 9	61.41
	wo MutualI	1.294 6	2.869 4	1.638 0	1.164 8	0.863 8	1.634 0	1.053 3	0.022 0	1.053 3	60.06
	wo EConstraint	1.157 5	2.665 4	1.616 2	1.180 0	0.888 7	1.587 6	1.383 8	0.060 2	1.383 8	60.20
Qwen1.5-14B	KGLLM（Ours）	1.806 5	5.206 8	3.195 7	2.267 7	1.652 1	3.080 6	3.234 0	0.028 2	3.234 0	61.85
	wo MutualI	1.700 9	4.782 2	2.845 6	2.010 9	1.473 4	2.778 0	2.630 2	0.017 0	2.630 2	61.11
	wo EConstraint	1.461 7	4.807 3	3.016 7	2.196 3	1.635 1	2.913 8	2.562 3	0.051 0	2.562 3	61.51
Average	KGLLM（Ours）	1.810 3	4.358 6	2.732 2	1.981 0	1.473 3	2.636 3	2.954 9	0.124 5	2.954 9	63.24
	wo MutualI	1.729 0 ^{⬇0.081 3}	3.977 7 ^{⬇0.381 0}	2.427 8 ^{⬇0.304 4}	1.742 9 ^{⬇0.238 1}	1.290 1 ^{⬇0.183 2}	2.359 6 ^{⬇0.276 7}	2.518 6 ^{⬇0.436 4}	0.092 6 ^{⬇0.031 9}	2.518 6 ^{⬇0.436 3}	61.69 ^⬇1.54
	wo EConstraint	1.777 5 ^{⬇0.032 9}	4.033 3 ^{⬇0.325 4}	2.539 7 ^{⬇0.192 5}	1.854 2 ^{⬇0.126 8}	1.386 8 ^{⬇0.086 5}	2.453 5 ^{⬇0.182 8}	2.827 7 ^{⬇0.127 1}	0.129 8 ^{⬇-0.005 3}	2.827 7 ^{⬇0.127 1}	62.77 ^⬇0.47

Table 5

Table 6

Comparison of different knowledge selection and filtering methods in evidence retrieval

Backbone	Model	GOOGLE BLEU	BLEU					ROUGE			BertScore/%
Backbone	Model	GOOGLE BLEU	BLEU_1	BLEU_2	BLEU_3	BLEU_4	Mean_BLEU	ROUGE_1	ROUGE_2	ROUGE_3	BertScore/%
Baichuan-7b	KGLLM（Ours）	1.918 0	3.482 2	2.255 1	1.703 9	1.304 8	2.186 5	2.572 6	0.271 0	2.572 6	64.53
	Semantic only	1.680 3	3.557 7	2.158 0	1.570 5	1.181 4	2.116 9	2.158 8	0.121 1	2.158 8	62.23
	Random	1.481 3	3.331 1	1.994 3	1.458 4	1.107 9	1.972 9	1.885 9	0.105 2	1.885 9	62.23
Baichuan-13b	KGLLM（Ours）	1.276 7	5.844 3	3.900 5	2.796 0	2.043 1	3.646 0	5.608 0	0.132 5	5.608 0	64.37
	Semantic only	1.488 8	5.467 1	3.529 7	2.483 8	1.776 4	3.314 3	4.753 6	0.093 1	4.753 6	62.42
	Random	1.535 9	5.253 2	3.318 5	2.326 3	1.670 7	3.142 2	4.550 1	0.114 2	4.550 1	62.05
ChatGLM3-6B	KGLLM（Ours）	2.608 6	4.105 5	2.386 5	1.716 7	1.281 6	2.372 6	2.166 8	0.151 8	2.166 8	64.03
	Semantic only	2.480 6	3.211 9	1.967 7	1.484 3	1.155 4	1.954 8	1.996 9	0.209 9	1.996 9	62.65
	Random	1.797 8	2.459 6	1.530 4	1.167 3	0.905 8	1.515 8	1.728 9	0.119 6	1.728 9	62.59
Qwen1.5-7B	KGLLM（Ours）	1.441 9	3.154 4	1.923 2	1.420 7	1.084 7	1.895 8	1.192 9	0.039 1	1.192 9	61.41
	Semantic only	1.294 6	2.869 4	1.638 0	1.164 8	0.863 8	1.634 0	1.053 3	0.022 0	1.053 3	60.06
	Random	1.250 1	2.939 5	1.649 4	1.167 6	0.865 2	1.655 4	0.889 7	0.002 2	0.889 7	59.84
Qwen1.5-14B	KGLLM（Ours）	1.806 5	5.206 8	3.195 7	2.267 7	1.652 1	3.080 6	3.234 0	0.028 2	3.234 0	61.85
	Semantic only	1.700 9	4.782 2	2.845 6	2.010 9	1.473 4	2.778 0	2.630 2	0.017 0	2.630 2	61.11
	Random	1.973 3	4.038 2	2.470 8	1.823 0	1.380 6	2.428 2	1.640 9	0.050 3	1.640 9	61.66
Average	KGLLM（Ours）	1.810 3	4.358 6	2.732 2	1.981 0	1.473 3	2.636 3	2.954 9	0.124 5	2.954 9	63.24
	Semantic only	1.729 0 ^{⬇0.081 3}	3.977 7 ^{⬇0.381 0}	2.427 8 ^{⬇0.304 4}	1.742 9 ^{⬇0.238 1}	1.290 1 ^{⬇0.183 2}	2.359 6 ^{⬇0.276 7}	2.518 6 ^{⬇0.436 4}	0.092 6 ^{⬇0.031 9}	2.518 6 ^{⬇0.436 3}	61.69 ^⬇1.54
	Random	1.607 7 ^{⬇0.202 7}	3.604 3 ^{⬇0.754 3}	2.192 7 ^{⬇0.539 5}	1.588 5 ^{⬇0.392 5}	1.186 0 ^{⬇0.287 2}	2.142 9 ^{⬇0.493 4}	2.139 1 ^{⬇0.815 8}	0.078 3 ^{⬇0.046 2}	2.139 1 ^{⬇0.815 8}	61.67 ^⬇1.56

Table 6

Table 7

Knowledge filtering samples based on information entropy

问题： 稻瘟病可以发生在水稻的各个生育期，根据发生时期和部位不同，可分为苗瘟、叶瘟、叶枕瘟、节瘟、穗瘟、穗颈瘟、枝梗瘟和谷粒瘟，其中穗颈瘟和枝梗瘟有什么症状？

检索路径：［｛'relation'： '症状'， 'source'： '水稻'， 'target'： '拔节期症状'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '水稻3叶期以前'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '芽和芽鞘上出现水渍状斑点'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '病苗基部变黑褐色'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '上部呈黄褐色或淡红色'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '病苗严重时枯死'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '潮湿时病部可长出灰绿色霉层'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '叶耳易感病'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '初为污绿色病斑'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '向叶环、叶舌、叶鞘及叶片不规则扩展'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '最后病斑灰白色至灰褐色'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '潮湿时长出灰绿色霉层'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '病叶早期枯死'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '容易引起穗颈瘟'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '主要发生在穗颈下第一、二节上'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '初为褐色或黑褐色小点'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '环状扩大至整个节部'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '潮湿时节上生出灰绿色霉层'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '易折断'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '亦可造成白穗'｝，｛'relation'： '症状'， 'source'： '穗颈瘟'， 'target'： '浅褐色小点'｝，｛'relation'： '症状'， 'source'： '穗颈瘟'， 'target'： '黄白色、褐色或黑色斑点'｝，｛'relation'： '症状'， 'source'： '穗颈瘟'， 'target'： '全白穗'｝，｛'relation'： '症状'， 'source'： '枝梗瘟'， 'target'： '浅褐色小点'｝，｛'relation'： '症状'， 'source'： '枝梗瘟'， 'target'： '黄白色、褐色或黑色病斑'｝，｛'relation'： '症状'， 'source'： '枝梗瘟'， 'target'： '发病迟谷粒不充实'｝，｛'relation'： '症状'， 'source'： '谷粒瘟'， 'target'： '发生在谷壳和护颖上'｝，｛'relation'： '症状'， 'source'： '谷粒瘟'， 'target'： '发病早的谷壳上病斑大而呈椭圆形，中部灰白色'｝，｛'relation'： '症状'， 'source'： '谷粒瘟'， 'target'： '可延及整个谷粒，造成暗灰色或灰白色的瘪谷'｝，｛'relation'： '症状'， 'source'： '谷粒瘟'， 'target'： '发病迟的则为椭圆形或不规则形的褐色斑点'｝，｛'relation'： '症状'， 'source'： '谷粒瘟'， 'target'： '严重时，谷粒不饱满，米粒变黑'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '白点型'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '急性型'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '慢性型'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '褐点型'｝］

信息熵过滤路径：［｛'relation'： '症状'， 'source'： '水稻'， 'target'： '拔节期症状'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '向叶环、叶舌、叶鞘及叶片不规则扩展'｝，｛'relation'： '症状'， 'source'： '枝梗瘟'， 'target'： '黄白色、褐色或黑色病斑'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '初为污绿色病斑'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '病叶早期枯死'｝，｛'relation'： '症状'， 'source'： '节瘟'， 'target'： '初为褐色或黑褐色小点'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '上部呈黄褐色或淡红色'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '水稻3叶期以前'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '病苗严重时枯死'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '病苗基部变黑褐色'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '褐点型'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '白点型'｝，｛'relation'： '症状'， 'source'： '枝梗瘟'， 'target'： '浅褐色小点'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '芽和芽鞘上出现水渍状斑点'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '慢性型'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '最后病斑灰白色至灰褐色'｝，｛'relation'： '症状类型'， 'source'： '叶瘟'， 'target'： '急性型'｝，｛'relation'： '症状'， 'source'： '叶枕瘟'， 'target'： '潮湿时长出灰绿色霉层'｝，｛ 'relation'： '症状'， 'source'： '穗颈瘟'， 'target'： '黄白色、褐色或黑色斑点'｝，｛'relation'： '症状'， 'source'： '苗瘟'， 'target'： '潮湿时病部可长出灰绿色霉层'｝］

Table 7

Table 8

Model-generated results under different knowledge constraints

Backbone	Model	GOOGLE BLEU	BLEU					ROUGE			BertScore/%
Backbone	Model	GOOGLE BLEU	BLEU_1	BLEU_2	BLEU_3	BLEU_4	Mean_BLEU	ROUGE_1	ROUGE_2	ROUGE_3	BertScore/%
Baichuan-7b	KGLLM（Ours）	1.918 0	3.482 2	2.255 1	1.703 9	1.304 8	2.186 5	2.572 6	0.271 0	2.572 6	64.53
	Path Discription	1.723 2	3.323 5	2.210 5	1.665 5	1.259 8	2.114 8	2.941 2	0.176 5	2.941 2	64.20
	Hard Constraint	0.759 7	1.052 9	0.558 2	0.387 1	0.283 8	0.570 5	0.160 0	0.000 0	0.160 0	54.46
Baichuan-13b	KGLLM（Ours）	1.276 7	5.844 3	3.900 5	2.796 0	2.043 1	3.646 0	5.608 0	0.132 5	5.608 0	64.37
	Path Discription	1.835 7	5.556 7	3.643 6	2.620 7	1.933 2	3.438 5	5.219 8	0.239 6	5.219 8	63.95
	Hard Constraint	0.609 2	1.043 3	0.597 4	0.423 6	0.316 2	0.595 1	0.287 1	0.000 0	0.287 1	55.82
ChatGLM3-6B	KGLLM（Ours）	2.608 6	4.105 5	2.386 5	1.716 7	1.281 6	2.372 6	2.166 8	0.151 8	2.166 8	64.03
	Path Discription	2.709 3	3.813 5	2.211 3	1.608 4	1.217 0	2.212 6	2.031 6	0.122 0	2.031 6	63.98
	Hard Constraint	0.379 8	0.549 3	0.285 3	0.187 9	0.130 4	0.288 2	0.138 5	0.000 0	0.138 5	51.03
Qwen1.5-7B	KGLLM（Ours）	1.441 9	3.154 4	1.923 2	1.420 7	1.084 7	1.895 8	1.192 9	0.039 1	1.192 9	61.41
	Path Discription	1.157 5	2.665 4	1.616 2	1.180 0	0.888 7	1.587 6	1.383 8	0.060 2	1.383 8	60.20
	Hard Constraint	0.843 3	1.948 5	1.044 7	0.724 3	0.531 6	1.062 3	0.109 9	0.000 0	0.109 9	58.44
Qwen1.5-14B	KGLLM（Ours）	1.806 5	5.206 8	3.195 7	2.267 7	1.652 1	3.080 6	3.234 0	0.028 2	3.234 0	61.85
	Path Discription	1.461 7	4.807 3	3.016 7	2.196 3	1.635 1	2.913 8	2.562 3	0.051 0	2.562 3	61.51
	Hard Constraint	0.956 2	2.129 2	1.274 2	0.946 5	0.721 2	1.267 8	0.174 1	0.000 0	0.174 1	60.01
Average	KGLLM（Ours）	1.810 3	4.358 6	2.732 2	1.981 0	1.473 3	2.636 3	2.954 9	0.124 5	2.954 9	6324
	Path Discription	1.777 5 ^{⬇0.032 9}	4.033 3 ^{⬇0.325 4}	2.539 7 ^{⬇0.192 5}	1.854 2 ^{⬇0.126 8}	1.386 8 ^{⬇0.086 5}	2.453 5 ^{⬇0.182 8}	2.827 7 ^{⬇0.127 1}	0.129 8 ^{⬇-0.005 3}	2.827 7 ^{⬇0.127 1}	62.77 ^⬇0.47
	Hard Constraint	0.709 6 ^{⬇1.100 7}	1.344 6 ^{⬇3.014 0}	0.752 0 ^{⬇1.980 2}	0.533 9 ^{⬇1.447 1}	0.396 6 ^{⬇1.076 6}	0.756 8 ^{⬇1.879 5}	0.173 9 ^{⬇2.781 0}	0.000 0 ^{⬇0.124 5}	0.173 9 ^{⬇2.781 0}	55.95 ^⬇7.28

Table 8

References 31

1	CAO Y Y, CHEN L, YUAN Y, et al. Cucumber disease recognition with small samples using image-text-label-based multi-modal language model[J]. Computers and electronics in agriculture, 2023, 211: ID 107993.
2	YANG A Y, XIAO B, WANG B N, et al. Baichuan 2: Open large-scale language models[EB/OL]. arXiv: 2309.10305, 2023.
3	OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[J]. Advances in neural information processing systems, 2022, 35: 27730-27744.
4	TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: Open and efficient foundation language models[EB/OL]. arXiv:2302.13971, 2023.
5	WANG H, DU X, YU W, et al. Apollo's oracle: Retrieval-augmented reasoning in multi-agent debates[EB/OL]. arXiv: 2312.04854, 2023.
6	HUANG L, YU W J, MA W T, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions[EB/OL]. arXiv: 2311.05232, 2023.
7	HU E J, SHEN Y, WALLIS P, et al. Lora: Low-rank adaptation of large language models[EB/OL]. arXiv: 2106.09685, 2021.
8	GAO Y F, XIONG Y, GAO X Y, et al. Retrieval-augmented generation for large language models: A survey[EB/OL]. arXiv:2312.10997, 2023.
9	ZHANG H B, CHEN J Y, JIANG F, et al. HuatuoGPT, towards taming language model to be a doctor[C]// Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg, PA, USA: ACL, 2023.
10	FATEHKIA M, LUCAS J K, CHAWLA S. T-RAG: Lessons from the LLM trenches[EB/OL]. arXiv: 2402.07483, 2024.
11	BAO Z J, CHEN W, XIAO S Z, et al. DISC-MedLLM: Bridging general large language models and real-world medical consultation[EB/OL]. arXiv: 2308.14346, 2023.
12	BAI J, BAI S, CHU Y, et al. Qwen technical report[EB/OL]. arXiv: 2309.16609, 2023.
13	HUANG Q Z, TAO M X, ZHANG C, et al. Lawyer llama technical report[EB/OL]. arXiv: 2305.15062, 2023.
14	CUI J, LI Z, YAN Y, et al. Chatlaw: Open-source legal large language model with integrated external knowledge bases[EB/OL]. arXiv: 2306.16092, 2023.
15	JIANG J, YAN L, LIU H, et al. Knowledge assimilation: Implementing knowledge-guided agricultural large language model[J]. Knowledge-based systems, 2025: ID 113197.
16	YIN Z Y, SUN Q S, GUO Q P, et al. Do large language models know what they don't know?[C]// Findings of the Association for Computational Linguistics: ACL 2023. Stroudsburg, PA, USA: ACL, 2023: 8653-8665.
17	KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the national academy of sciences of the United States of America, 2017, 114(13): 3521-3526.
18	JIANG X, ZHANG R, XU Y, et al. Think and retrieval: A hypothesis knowledge graph enhanced medical large language models[EB/OL]. arXiv: 2312.15883, 2023.
19	FENG Z Y, MA W T, YU W J, et al. Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications[EB/OL]. arXiv: 2311.05876, 2023.
20	姜京池, 关昌赫, 刘劼, 等. 基于主动学习与众包的农业知识标注体系及语料库构建[J]. 中文信息学报, 2023, 37(1): 33-45.
	JIANG J C, GUAN C H, LIU J, et al. Annotation scheme and corpus construction for agricultural knowledge based on active learning and crowdsourcing[J]. Journal of Chinese information processing, 2023, 37(1): 33-45.
21	NAKANO R, HILTON J, BALAJI S, et al. WebGPT: Browser-assisted question-answering with human feedback[EB/OL]. arXiv:2112.09332, 2021.
22	王春雨, 王芳. 基于条件随机场的农业命名实体识别研究[J]. 河北农业大学学报, 2014, 37(1): 132-135.
	WANG C Y, WANG F. Study on agricultural named entity recognition based on conditional random field[J]. Journal of agricultural university of Hebei, 2014, 37 (1): 132-135.
23	YAO X, HAO X, LIU R . et al. AgCNER, the first large-scale chinese named entity recognition dataset for agricultural diseases and pests[J]. Scientific Data, 2024, 11: ID 769.
24	沈利言, 姜海燕, 胡滨, 等. 水稻病虫草害与药剂实体关系联合抽取算法[J]. 南京农业大学学报, 2020, 43(6): 1151-1161.
	SHEN L Y, JIANG H Y, HU B, et al. A study on joint entity recognition and relation extraction for rice diseases pests weeds and drugs[J]. Journal of Nanjing agricultural university, 2020, 43(6): 1151-1161.
25	KUHN L, GAL Y, FARQUHAR S. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation[EB/OL]. arXiv: 2302.09664, 2023.
26	ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report[EB/OL]. arXiv: 2303.08774, 2023.
27	WEI T W, ZHAO L, ZHANG L C, et al. Skywork: A more open bilingual foundation model[EB/OL]. arXiv: 2310.19341, 2023.
28	ZHU K, FENG X C, DU X Y, et al. An information bottleneck perspective for effective noise filtering on retrieval-augmented generation[C]// In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand: Association for Computational Linguistics, 2024: 1044-1069.
29	REIMERS N, GUREVYCH I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg, PA, USA: ACL, 2019.
30	DENG Y F, ZHANG X S, HUANG H Y, et al. Towards faithful dialogues via focus learning[C]// Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: ACL, 2023: 4554-4566.
31	DU Z X, QIAN Y J, LIU X, et al. GLM: General language model pretraining with autoregressive blank infilling[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: ACL, 2022: 320-335.

数据类型	数据统计
# 问答对	2 664.00
# 问答对中的词总量	491 566.00
# 每个问答对的平均词数量	184.52
# 问题涉及的词总量	195 592.00
# 每个问题涉及的平均词数量	73.42
# 答案涉及的词总量	295 974.00
# 每个答案涉及的平均词数量	111.10

[1]	YANG Chenxue, LI Xian, ZHOU Qingbo. Knowledge Graph Driven Grain Big Data Applications: Overview and Perspective [J]. Smart Agriculture, 2025, 7(2): 26-40.
[2]	YUAN Huan, FAN Beilei, YANG Chenxue, LI Xian. Graph Neural Networks for Knowledge Graph Construction: Research Progress, Agricultural Development Potential, and Future Directions [J]. Smart Agriculture, 2025, 7(2): 41-56.
[3]	QIAO Lei, CHEN Lei, YUAN Yuan. Bi-Intentional Modeling and Knowledge Graph Diffusion for Rice Variety Selection and Breeding Recommendation [J]. Smart Agriculture, 2025, 7(2): 73-80.
[4]	JIN Ning, GUO Yufeng, HAN Xiaodong, MIAO Yisheng, WU Huarui. Method for Calculating Semantic Similarity of Short Agricultural Texts Based on Transfer Learning [J]. Smart Agriculture, 2025, 7(1): 33-43.
[5]	GUO Wei, WU Huarui, GUO Wang, GU Jingqiu, ZHU Huaji. Research Status and Prospect of Quality Intelligent Control Technology in Facilities Environment of Characteristic Agricultural Products [J]. Smart Agriculture, 2024, 6(6): 44-62.
[6]	ZHAO Chunjiang. Agricultural Knowledge Intelligent Service Technology: A Review [J]. Smart Agriculture, 2023, 5(2): 126-148.
[7]	GENG Wenxuan, ZHAO Junye, RUAN Jiwei, HOU Yuehui. Comparative Study of the Regulation Effects of Artificial Intelligence-Assisted Planting Strategies on Strawberry Production in Greenhouse [J]. Smart Agriculture, 2022, 4(2): 183-193.
[8]	ZHANG Yu, ZHAO Chunjiang, LIN Sen, GUO Wenzhong, WEN Chaowu, LONG Jiehua. Irrigation Method and Verification of Strawberry Based on Penman-Monteith Model and Path Ranking Algorith [J]. Smart Agriculture, 2021, 3(3): 116-128.
[9]	LI Liangde, WANG Xiujuan, KANG Mengzhen, HUA Jing, FAN Menghan. Agricultural Named Entity Recognition Based on Semantic Aggregation and Model Distillation [J]. Smart Agriculture, 2021, 3(1): 118-128.