欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2021, Vol. 3 ›› Issue (1): 118-128.doi: 10.12133/j.smartag.2021.3.1.202012-SA001

• 信息处理与决策 • 上一篇    下一篇

基于语义融合与模型蒸馏的农业实体识别

李亮德1,2(), 王秀娟1,3, 康孟珍1,2, 华净1,4, 樊梦涵1,2   

  1. 1.中国科学院自动化研究所 复杂系统管理与控制国家重点实验室,北京 100190
    2.中国科学院大学 人工智能学院,北京 100049
    3.北京智能化技术与系统工程技术研究中心,北京 100190
    4.青岛中科慧农科技有限公司,山东 青岛 266000
  • 收稿日期:2020-12-17 修回日期:2021-02-09 出版日期:2021-03-30
  • 基金资助:
    中国科学院战略性先导科技专项(A类)(XDA20030102);国家自然科学基金面上项目(62076239);中国科学院与泰国科技发展署合作研究资助项目(GJHZ2076)
  • 作者简介:李亮德(1996-),男,硕士研究生,研究方向为面向农业的自然语言处理。E-mail:liliangde2018@ia.ac.cn
  • 通信作者:

Agricultural Named Entity Recognition Based on Semantic Aggregation and Model Distillation

LI Liangde1,2(), WANG Xiujuan1,3, KANG Mengzhen1,2, HUA Jing1,4, FAN Menghan1,2   

  1. 1.The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
    3.Beijing Engineering Research Center of Intelligent Systems and Technology, Beijing 100190, China
    4.Qingdao Smart AgriTech. , Ltd, Qingdao 266000, China
  • Received:2020-12-17 Revised:2021-02-09 Online:2021-03-30

摘要:

当前农业实体识别标注数据稀缺,部分公开的农业实体识别模型依赖手工特征,实体识别精度低。虽然有的农业实体识别模型基于深度学习方法,实体识别效果有所提高,但是存在模型推理延迟高、参数量大等问题。本研究提出了一种基于知识蒸馏的农业实体识别方法。首先,利用互联网的海量农业数据构建农业知识图谱,在此基础上通过远程监督得到弱标注语料。其次,针对实体识别的特点,提出基于注意力的BERT层融合模型(BERT-ALA),融合不同层次的语义特征;结合双向长短期记忆网络(BiLSTM)和条件随机场CRF,得到BERT-ALA+BiLSTM+CRF模型作为教师模型。最后,用BiLSTM+CRF模型作为学生模型蒸馏教师模型,保证模型预测耗时和参数量符合线上服务要求。在本研究构建的农业实体识别数据集以及两个公开数据集上进行实验,结果显示,BERT-ALA+BiLSTM+CRF模型的macro-F1相对于基线模型BERT+ BiLSTM+CRF平均提高1%。蒸馏得到的学生模型BiLSTM+CRF的macro-F1相对于原始数据训练的模型平均提高3.3%,预测耗时降低了33%,存储空间降低98%。试验结果验证了基于注意力机制的BERT层融合模型以及知识蒸馏在农业实体识别方面具有有效性。

关键词: 远程监督, 农业知识图谱, 农业问答系统, 实体识别, 知识蒸馏, 深度学习, BERT, 双向长短期记忆网络

Abstract:

With the development of smart agriculture, automatic question and answer (Q&A) of agricultural knowledge is needed to improve the efficiency of agricultural information acquisition. Agriculture named entity recognition plays a key role in automatic Q&A system, which helps obtaining information, understanding agriculture questions and providing answer from the knowledge graph. Due to the scarcity of labeled ANE data, some existing open agricultural entity recognition models rely on manual features, can reduce the accuracy of entity recognition. In this work, an approach of model distillation was proposed to recognize agricultural named entity data. Firstly, massive agriculture data were leveraged from Internet, an agriculture knowledge graph (AgriKG) was constructed. To overcome the scarcity of labeled named agricultural entity data, weakly named entity recognition label on agricultural texts crawled from the Internet was built with the help of AgriKG. The approach was derived from distant supervision, which was used to solve the scarcity of labeled relation extraction data. Considering the lack of labeled data, pretraining language model was introduced, which is fine tuned with existing labeled data. Secondly, large scale pretraining language model, BERT was used for agriculture named entity recognition and provided a pretty well initial parameters containing a lot of basic language knowledge. Considering that the task of agriculture named entity recognition relied heavily on low-end semantic features but slightly on high-end semantic features, an Attention-based Layer Aggregation mechanism for BERT(BERT-ALA) was designed in this research. The aim of BERT-ALA was to adaptively aggregate the output of multiple hidden layers of BERT. Based on BERT-ALA model, Bidirectional LSTM (BiLSTM) and conditional random field (CRF) were coupled to further improve the recognition precision, giving a BERT-ALA+BiLSTM+CRF model. Bi-LSTM improved BERT's insufficient learning ability of the relative position feature, while conditional random field models the dependencies of entity recognition label. Thirdly, since BERT-ALA+BiLSTM+CRF model was difficult to serve online because of the extremely high time and space complexity, BiLSTM+CRF model was used as student model to distill BERT-ALA+BiLSTM+CRF model. It fitted the BERT-ALA+BiLSTM+CRF model's output of BiLSTM layer and CRF layer. The experiment on the database constructed in the research, as well as two open datasets showed that (1) the macro-F1 of the BERT-ALA + BiLSTM + CRF model was improved by 1% compared to the baseline model BERT + BiLSTM + CRF, and (2) compared with the model trained on the original data, the macro-F1 of the distilled student model BiLSTM + CRF was increased by an average of 3.3%, the prediction time was reduced by 33%, and the storage space was reduced by 98%. The experimental results verify the effectiveness of the BERT-ALA and knowledge distillation in agricultural entity recognition.

Key words: distant supervision, agriculture knowledge graph, agriculture Q&A system, named entity recognition, knowledge distillation, deep learning, BERT, Bi-LSTM

中图分类号: