Key Factor Extraction Method of Agricultural User Demand Based on Large Language Models

doi:10.12133/j.smartag.SA202509011

Abstract

Abstract:

[Objective] In the agricultural domain, user demand texts serve as essential primary sources for agricultural extension, production management, and policy services. However, these texts typically contain highly specialized terminology, exhibit non-standard, colloquial, and diverse linguistic expressions, present fragmented semantics, and rely heavily on contextual reasoning. Such characteristics make them difficult to parse accurately using traditional rule-based approaches or shallow machine learning models. Consequently, these limitations often lead to biased demand classification and incomplete extraction of key factors, thereby constraining the quality of data available for intelligent agricultural decision-making. To address these challenges, the aim of this research is to develop a robust, domain-adapted, and highly interpretable structured analysis method for agricultural user demands. [Methods] Agri-NeedAgent, an agricultural user demand analysis framework, was proposed based on a "three-stage training + multi-agent collaboration" paradigm. First, during the domain knowledge pretraining stage, 80 000 agriculture-related texts, including crop cultivation manuals, pest and disease control guides, agricultural policy documents, and farmer consultation records, were used to construct domain-specific semantic understanding, thereby enhancing the model's capability to interpret agricultural terminology, dialectal expressions, contextual logic, and implicit semantics. Second, in the instruction fine-tuning stage, 6 320 annotated samples in an "instruction-input-output" format were employed to establish an explicit mapping from raw demand texts to structured outputs. Third, in the agricultural knowledge low-rank adaptation stage, Low-rank Adaptation (LoRA) was applied to perform lightweight parameter tuning on task-specific agents, enabling targeted adaptation for demand classification and key-factor extraction tasks. Built upon the above training process, a multi-agent collaborative framework was constructed, in which the manager agent was responsible for task scheduling and quality control, while task agents were designed to perform demand classification, key-factor extraction, and explanation generation, respectively. Through this division of labor and collaborative mechanism, the framework achieved efficient and structured analysis of agricultural user demands. [Results and Discussions] Experimental results demonstrate that the proposed Agri-NeedAgent achieved a demand classification accuracy of 84.6%, a key-factor extraction F₁-Score of 85.2%, a structured interface compliance rate of 94.2%, and an interpretability score of 90.2.These results showed clear improvements over traditional deep learning models such as Bidirectional Encoder Representations from Transformers (BERT) as well as general-purpose large language models (LLMs) without domain adaptation. The findings confirmed the critical role of domain knowledge injection, explicit task alignment, and multi-agent specialization in enhancing semantic understanding and structured analysis of agricultural texts. Ablation experiments further validated the effectiveness of each component. Removing domain pretraining or LoRA fine-tuning resulted in substantial performance degradation in classification and key-factor extraction, indicating the necessity of domain adaptation and task-specific optimization for handling non-standard agricultural expressions. Moreover, eliminating the manager agent or the Reasoning and Acting (ReAct) mechanism significantly reduced structured interface compliance and interpretability, highlighting the importance of task coordination, intermediate verification, and multi-step reasoning for ensuring logical consistency and output completeness. Additionally, removing the external knowledge base reduced the interpretability score from 90.2 to 77.6, underscoring its essential role in providing theoretical grounding, reasoning support, and professional explanations. Although the multi-agent collaboration introduced an additional inference overhead of approximately 140 ms, the overall per-sample inference time remained within 225 ms, meeting the real-time requirements of agricultural consultation scenarios. [Conclusions] Supported by a "three-stage training + multi-agent collaboration" framework, LLMs can effectively address challenges posed by non-standard expressions, semantic fragmentation, and multi-factor reasoning in agricultural user demand texts. The proposed method demonstrated significant improvements in demand classification, key-factor extraction, structured output compliance, and interpretability, providing high-quality and traceable structured data for intelligent agricultural decision-making. After domain adaptation and task-specific tuning, the model not only gains enhanced capability for deep semantic analysis of agricultural user demands but also ensures the completeness and interpretability of outputs through multi-agent coordination. Although the current workflow still requires optimization in terms of data preparation, staged training, and knowledge-base updating, future work will focus on expanding region-specific and emerging-technology-related demand data, developing a dynamically updated agricultural knowledge system, improving multi-agent coordination efficiency, and exploring cross-lingual agricultural demand analysis to further promote the application and deployment of agricultural large models across broader scenarios.

Key words: agricultural demand analysis, multi-agent, large language model, key factor extraction, domain fine-tuning

CLC Number:

S126

LI Runteng, WANG Yiqun, LI Hongda, LI Jingchen, CHEN Wenbai. Key Factor Extraction Method of Agricultural User Demand Based on Large Language Models[J]. Smart Agriculture, 2026, 8(2): 265-278.

Figures/Tables 13

Table 1

Table 2

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Table 3

Table 4

Fig. 8

Table 5

References 24

[1]	刘文, 王昊, 李啸林. 农业数字化转型: 研究现状、热点与趋势[J]. 技术经济与管理研究, 2025(8): 47-53.
	LIU W, WANG H, LI X L. Digital transformation of agriculture: Research status, hotspots and trends[J]. Journal of Technical Economics & Management, 2025(8): 47-53.
[2]	DAYIOĞLU MALI, TURKER U. Digital transformation for sustainable future - agriculture 4.0: A review[J]. Tarım Bilimleri Dergisi, 2021, 27(4): 373-399.
[3]	SARKER I H. Data science and analytics: An overview from data-driven smart computing, decision-making and applications perspective[J]. SN Computer Science, 2021, 2(5): 377.
[4]	AYED RBEN, HANANA M. Artificial intelligence to improve the food and agriculture sector[J]. Journal of Food Quality, 2021, 2021(1): 5584754.
[5]	SHARMA S, SHARMA C, ASENSO E, et al. Research constituents and trends in smart farming: An analytical retrospection from the lens of text mining[J]. Journal of Sensors, 2023, 2023(1): 6916213.
[6]	李孝鹏, 向玉云, 张培君, 等. 农业领域自然语言理解技术应用综述[J]. 农业机械学报, 2025, 56(10): 200-222.
	LI X P, XIANG Y Y, ZHANG P J, et al. Natural language understanding in agriculture: A comprehensive review of technologies and applications[J]. Transactions of the Chinese Society for Agricultural Machinery, 2025, 56(10): 200-222.
[7]	SHARMA P, DADHEECH P, ANEJA N, et al. Predicting agriculture yields based on machine learning using regression and deep learning[J]. IEEE Access, 2023, 11: 111255-111264.
[8]	王耀君, 徐国威, 朱建军, 等. 农业领域大语言模型研究进展[J]. 农业机械学报, 2025, 56(9): 240-256.
	WANG Y J, XU G W, ZHU J J, et al. Survey of research on large language models in agriculture[J]. Transactions of the Chinese Society for Agricultural Machinery, 2025, 56(9): 240-256.
[9]	NISMI MOL E A, SANTOSH KUMAR M B. Review on knowledge extraction from text and scope in agriculture domain[J]. Artificial Intelligence Review, 2023, 56(5): 4403-4445.
[10]	CHATTERJEE N, KAUSHIK N. Automatic extraction of agriculture terms from domain text: A survey of tools and techniques[EB/OL]. arXiv: 2009.11796, 2020.
[11]	KOK Z H, MOHAMED SHARIFF A R, ALFATNI M S M, et al. Support vector machine in precision agriculture: A review[J]. Computers and Electronics in Agriculture, 2021, 191: 106546.
[12]	SHI L, QIN Y Q, ZHANG J J, et al. Multi-class classification of agricultural data based on random forest and feature selection[J]. Journal of Information Technology Research, 2022, 15(1): 1-17.
[13]	DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers). Minneapolis, Minnesota, USA: Association for Computational Linguistics, 2019: 4171-4186.
[14]	LIU Y F, WEI S Q, HUANG H J, et al. Naming entity recognition of citrus pests and diseases based on the BERT-BiLSTM-CRF model[J]. Expert Systems with Applications, 2023, 234: 121103.
[15]	王婷, 王娜, 崔运鹏, 等. 基于人工智能大模型技术的果蔬农技知识智能问答系统[J]. 智慧农业(中英文), 2023, 5(4): 105-116.
	WANG T, WANG N, CUI Y P, et al. Agricultural technology knowledge intelligent question-answering system based on large language model[J]. Smart Agriculture, 2023, 5(4): 105-116.
[16]	郭旺, 杨雨森, 吴华瑞, 等. 农业大模型:关键技术、应用分析与发展方向[J]. 智慧农业(中英文), 2024, 6(2): 1-13.
	GUO W, YANG Y S, WU H R, et al. Big models in agriculture: Key technologies, application and future directions[J]. Smart Agriculture, 2024, 6(2): 1-13.
[17]	李小玲. 智慧农业领域人工智能大模型的应用研究[J]. 中国农机装备, 2025(6): 81-83.
	LI X L. Research on application of large language models in smart agriculture[J]. China Agricultural Machinery Equipment, 2025(6): 81-83.
[18]	RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J/OL]. 2018.[2025-08-26].
[19]	姜京池, 闫莲, 刘劼. 基于精准知识筛选及知识协同生成的农业大语言模型[J]. 智慧农业(中英文), 2025,7(1): 20-32.
	JIANG J C, YAN L, LIU J. Agricultural large language model based on precise knowledge retrieval and knowledge collaborative generation[J]. Smart Agriculture, 2025,7(1): 20-32.
[20]	ACHARYA D B, KUPPAN K, DIVYA B. Agentic AI: Autonomous intelligence for complex goals: A comprehensive survey[J]. IEEE Access, 2025, 13: 18912-18936.
[21]	任荣荣, 胡崇宇, 吴国龙, 等. 农业种植智能体(Agri-agent)的构建与应用展望[J]. 农业展望, 2024, 20(6): 92-106.
	REN R R, HU C Y, WU G L, et al. Construction and application prospect of agricultural planting agent (Agri-agent)[J]. Agricultural Outlook, 2024, 20(6): 92-106.
[22]	HONG S R, ZHUGE M C, CHEN J Q, et al. MetaGPT: Meta programming for a multi-agent collaborative framework[EB/OL]. arXiv: 2308.00352, 2023.
[23]	WU Q Y, BANSAL G, ZHANG J Y, et al. AutoGen: Enabling next-gen LLM applications via multi-agent conversation[EB/OL]. arXiv: 2308.08155, 2023.
[24]	WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. New York, USA: ACM, 2022: 24824-24837.

来源	数据类型	文本数量	格式	时间
合计	—	22 800	—	—
农业论坛	文本	14 000	JSON	2022—2025
用户咨询记录	文本	6 500	JSON	2021—2025
专家访谈文本	文本	2 300	JSON	2022—2024

数据源	数据类型	实体/记录数量	格式
农业书籍	文本	300本	JSON
农技问答	文本	5 000条问答对	JSON
研究论文	文本	150篇论文	JSON
农业政策文件	文本	220份	PDF

评分档位	分数范围	农业场景示例（对应本研究需求类型）
优秀	90~100	完全覆盖用户需求核心要素，如“黄土高原种植节水苹果”需求中，解释文本明确包含：黄土高原（区域）、节水（诉求）、苹果（作物）所有关键信息
良好	70~89	覆盖核心要素但表述略有偏差，如遗漏“节水”的具体表述，仅提及“黄土高原地区苹果品种选择”
合格	50~69	部分覆盖核心要素，如仅提及“苹果品种选择”，未包含“黄土高原”区域信息
不足	30~49	核心要素偏差，如将“苹果”误表述为“梨”，需求指向错误
缺失	0~29	无需求语义关联，如解释文本与“黄土高原种植节水苹果”完全无关，仅重复通用农业术语

模型/方法	需求类型分析准确率/%	关键因子提取F ₁值/%	合规率/%	可解释性
BERT	59.8±0.5	65.2±0.4	50.2±0.6	66.6±1.4
Qwen3-1.7B	56.5±0.4	61.1±0.5	78.7±0.3	77.9±0.8
Qwen3-4B	72.5±0.3	71.8±0.4	86.6±0.2	83.3±0.7
DeepSeek-R1：1.5B	55.2±0.5	60.7±0.4	77.6±0.3	76.1±0.9
DeepSeek-R1：7B	75.6±0.3	76.4±0.3	90.3±0.2	82.7±0.6
ChatGLM3-6B	70.5±0.3	71.2±0.4	83.8±0.2	81.5±0.6
Baichuan2-7B-Base	74.2±0.3	74.1±0.4	87.5±0.2	82.2±0.6
InternLM2.5-7B	77.8±0.4	76.0±0.3	90.8±0.3	83.7±0.6
Agri-NeedAgent	84.6±0.2	85.2±0.2	94.2±0.1	90.2±0.5

系统组件	需求类型分析准确率/%	关键因子提取F ₁值/%	合规率/%	可解释性
全模块（Agri-NeedAgent）	84.6±0.2	85.2±0.2	94.2±0.1	90.2±0.5
移除ReAct思维链	82.3±0.3	82.6±0.4	87.1±0.4	87.2±0.5
移除少样本提示	81.5±0.2	81.7±0.2	86.6±0.2	87.5±0.6
移除MA	82.0±0.4	81.9±0.5	81.4±0.7	86.4±0.7
没有领域预训练	78.8±0.3	79.5±0.4	91.3±0.2	88.6±0.6
没有指令微调	80.4±0.2	80.6±0.2	90.8±0.2	87.4±0.5
没有LoRA微调	77.5±0.2	76.2±0.2	92.6±0.2	88.9±0.5
移除外部知识库	83.7±0.2	84.4±0.2	93.9±0.1	77.6±0.9