欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

ADON-R:基于语义相似与规则推理的农业本体网络构建方法

陈晓静1,2, 李威1,2, 李瑞航1,2, 林佳2, 姚琼2, 吴雯迪2, 樊景超1,2, 闫燊4, 王健1,2, 张建华1,2(), 周国民1,2,3,5()   

  1. 1. 中国农业科学院农业信息研究所/农业农村部农业大数据重点实验室/国家农业科学数据中心 北京 100081,中国
    2. 三亚中国农业科学院国家南繁研究院,海南 三亚 572024,中国
    3. 中国农业科学院 西部农业研究中心,新疆 昌吉 831100,中国
    4. 中国农业科学院作物科学研究所,北京 100081,中国
    5. 农业农村部南京农业机械化研究所,江苏 南京 210014,中国
  • 收稿日期:2025-10-15 出版日期:2026-01-20
  • 基金项目:
    国家重点研发计划(2022YFF0711800); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2448); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2340); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2409); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2410); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2430); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2508); 三亚中国农业科学院国家南繁研究院南繁专项(YBXM2509); 中央级公益性科研院所基本科研业务费专项(JBYW-AII-2025-05); 国家农业科学数据中心项目(NASDC2025XM11)
  • 作者简介:

    陈晓静,硕士,研究方向为农业信息技术,E-mail:

    李 威,硕士研究生,研究方向为农业信息技术,E-mail:

    陈晓静和李威共同第一作者

  • 通信作者:
    周国民,博士,研究员,研究方向为农业信息技术,E-mail:
    张建华,博士,研究方向为农业信息技术,E-mail:

ADON-R: A Method for Constructing an Agricultural Ontology Network Based on Semantic Similarity and Rule-Based Reasoning

CHEN XiaoJing1,2, LI Wei1,2, LI Ruihang1,2, LIN Jia2, YAO Qiong2, WU Wendi2, FAN Jingchao1,2, YAN Shen4, WANG Jian1,2, ZHANG Jianhua1,2(), ZHOU Guomin1,2,3,5()   

  1. 1. Institute of Agricultural Information, Chinese Academy of Agricultural Sciences/Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs/National Agricultural Science Data Center, Beijing 100081, China
    2. Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya 572024, China
    3. Western Research Institute, Chinese Academy of Agricultural Sciences, Changji 831100, China
    4. Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    5. Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014, China
  • Received:2025-10-15 Online:2026-01-20
  • Foundation items:National Key R&D Program of China(2022YFF0711800); Nanfan Special Project of Sanya Academy of Chinese Academy of Agricultural Sciences(YBXM2448); Central Public-interest Scientific Institution Basal Research Fund(JBYW-AII-2025-05); National Agricultural Science Data Center Project(NASDC2025XM11)
  • About author:

    CHEN XiaoJing, E-mail: ;

    LI Wei, E-mail:

  • Corresponding author:
    ZHOU Guomin, E-mail: ;
    ZHANG Jianhua, E-mail:

摘要:

【目的/意义】 农业知识的有效整合与深度挖掘对于推动智慧农业发展至关重要,然而,多源异构的农业本体数据导致了严重的“知识孤岛”现象。现有的本体推理方法局限于单一的语义维度,无法全面刻画复杂的生物学关系,这严重制约了农业知识体系的深度应用。 【方法】 针对以上问题,构建了一套完整的多策略融合农业本体网络推理框架,并提出了一种结合语义相似与规则推理的农业本体网络推理方法ADON-R(Agricultural Deep Ontology Network-Reasoner)。首先,该模型设计了一个基于逻辑规则的基础关系推理模块,利用10种生物学传递性规则精确构建本体网络的核心骨架。其次,为解决复杂相关性关系的发掘问题,创建了一套包含定义、语义、生物网络、功能特性和参考物种5个维度的分级证据体系,并依此构建了相关性关系推理模块。该模块的一个核心创新在于,它不仅是一种推理机制,更是一套可靠性量化框架,通过计算关系所满足的独立证据数量,将其划分为不同等级。为实现高效推理,该模块集成了改进的Jaccard相似度算法、基于Jena规则的解释器及融合了BioBERT预训练语言模型与FAISS(Facebook AI Similarity Search)向量检索技术的高效语义相似度计算流程,实现了对潜在关系的深度挖掘。最终,通过分级融合策略,将不同证据强度的关系划分为I~IV这4个等级,构建了农业本体知识网络。使用STS-B(Semantic Textual Similarity Benchmark)国际标准测试集取得了0.852 0的斯皮尔曼相关系数,验证了BioBERT在语义理解任务上的有效性,ADON-R方法推理共得到1 305 312条本体新关系。[结果与讨论]结果表明:I级到Ⅳ级的分级推理能够在本体之间建立复杂的上下游关系,ADON-R方法可以大幅细化本体结构和扩展本体内外部关联。同时,图数据库实例分析证实,新构建的网络能够揭示深层次、跨领域的术语关联。 【结论】 该方法的建立为农业知识挖掘提供了可借鉴的思路,所形成的农业关系网络能够极大满足多领域农业学者调用和组织的需求。

关键词: 本体推理, 预训练语言模型, 向量检索, 规则推理, 分级证据, 农业本体网络

Abstract:

[Objective] The agricultural knowledge ecosystem has long been hindered by "knowledge silos" arising from heterogeneous, multi-source ontologies—a critical bottleneck impeding the advancement of smart agriculture. Existing ontology reasoning approaches are typically confined to a single semantic dimension or rely solely on formal logical rules, rendering them inadequate for capturing the intricate biological relationships and cross-disciplinary knowledge structures inherent in agricultural domains. To address this challenge, agricultural deep ontology network-reasoner (ADON-R), a novel framework is proposed, that integrates semantic similarity with rule-based inference for constructing a unified agricultural ontology network. The aim is to establish a hybrid reasoning architecture that combines logical rigor with semantic discovery capabilities, enabling the systematic integration of 28 internationally recognized agricultural ontologies into a richly interconnected, structurally refined, and reliability-quantified knowledge network. [Methods] A dual-track inference architecture was designed, comprising a basic relational reasoning module and a graded relevance-based reasoning module. In the basic module, ten biologically plausible transitivity rules (R1-R10) were manually formulated based on four core relations defined by the open biomedical ontologies (OBO) Foundry: is_a, part_of, has_part, and regulates. These rules were implemented via explicit SPARQL(SPARQL Protocol and RDF Query Language) queries using the Apache Jena library to precisely complete missing triples, thereby establishing a logically consistent backbone for the knowledge network. This strategy prioritized interpretability and controllability, effectively avoiding rule conflicts or redundant derivations commonly introduced by generic reasoners. In the graded relevance-based module, a five-dimensional evidence framework was introduced, encompassing: definition-based similarity, semantic similarity, biological network proximity, functional trait alignment, and taxonomic co-reference. Semantic similarity was computed by embedding term definitions using the BioBERT pre-trained language model, followed by large-scale approximate nearest neighbor search via FAISS(Facebook AI Similarity Search) across more than 130 000 definition texts. To validate BioBERT's efficacy, comparative experiments were conducted on the STS-B(Semantic Textual Similarity Benchmark) benchmark, with performance evaluated using Spearman's rank correlation coefficient. The remaining four evidence dimensions were derived through string matching, Jena rule execution, and traversal of specific relation paths (e.g., regulates, subClassOf, only_in_taxon). Newly inferred relations were classified into four confidence tiers (I-IV) based on the number of independent supporting evidence types: Tier I required ≥3 heterogeneous evidence sources; Tier II required exactly 2; Tier III relied solely on definition-based similarity; and Tier IV represented associations supported by a single non-definition evidence type. To mitigate error propagation, only relations of Tiers I–III were permitted to participate in subsequent transitive inference under constrained conditions, while Tier IV relations were excluded from further chaining due to insufficient evidential support. [Results and Discussion] The experimental pipeline integrated approximately 167 887 terms and 249 603 initial relations from 28 agricultural ontologies. ADON-R generated 1 305 312 new ontology relations. The basic reasoning module contributed 182 779 triples, substantially expanding high-confidence is_a and part_of hierarchies through transitive closure. Within the graded relevance module, definition-based similarity yielded the largest volume of inferences (557 825 relations). Notably, only four Tier I relations were produced—reflecting the method's "high precision, low recall" design principle that prioritized consensus among multiple orthogonal evidence streams. Tier II comprised 3 539 relations, while Tiers III and IV each exceeded 557 825 and 561 165 relations, respectively, collectively forming a nuanced spectrum of inferred associations ranging from highly reliable to exploratory. On the STS-B test set, BioBERT achieved a Spearman correlation of 0.852 0—slightly below general-domain BERT (0.868 1) but outperforming specialized biomedical models such as ClinicalBERT (0.844 2) and BlueBERT (0.818 0)—demonstrating its suitability for domain-specific semantic understanding. Case studies in a graph database further illustrated ADON-R's capacity to uncover deep, cross-ontology connections. [Conclusions] The ADON-R framework successfully constructs a large-scale, structurally granular, and reliability-stratified agricultural ontology network, effectively mitigating knowledge fragmentation across heterogeneous sources. By harmonizing logical rule inference with deep semantic modeling, ADON-R not only preserves the logical integrity of core ontological structures but also substantially enhances the discovery of latent cross-domain associations. Its novel evidence-grading mechanism endows automatically inferred relations with actionable confidence labels, markedly improving the adaptability and robustness of the knowledge network in real-world applications. Although rigorous empirical validation remains pending, ADON-R provides a methodological foundation for knowledge infrastructure in smart agriculture.

Key words: ontology reasoning, pre-trained language model, vector retrieval, rule-based reasoning, hierarchical evidence, agricultural ontology network

中图分类号: