Peng, Cheng; Wang, Xiajun; Li, Qifeng; Yu, Qinyang; Jiang, Ruixiang; Ma, Weihong; Wu, Wenbiao; Meng, Rui; Li, Haiyan; Huai, Heju; Wang, Shuyan; He, Longjuan

doi:10.3390/app14166944

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning.
Authors: Peng, Cheng; Wang, Xiajun; Li, Qifeng; Yu, Qinyang; Jiang, Ruixiang; Ma, Weihong; Wu, Wenbiao; Meng, Rui; Li, Haiyan; Huai, Heju; Wang, Shuyan; He, Longjuan
Abstract: Featured Application: Our work provides reliable technical support for the information extraction of pig diseases in Chinese. It can be applied toother domain-specific fields, thereby facilitating seamless adaptation for named entity identification across diverse contexts. Named Entity Recognition (NER) is a fundamental and pivotal stage in the development of various knowledge-based support systems, including knowledge retrieval and question-answering systems. In the domain of pig diseases, Chinese NER models encounter several challenges, such as the scarcity of annotated data, domain-specific vocabulary, diverse entity categories, and ambiguous entity boundaries. To address these challenges, we propose PDCNER, a Pig Disease Chinese Named Entity Recognition method leveraging lexicon-enhanced BERT and contrastive learning. Firstly, we construct a domain-specific lexicon and pre-train word embeddings in the pig disease domain. Secondly, we integrate lexicon information of pig diseases into the lower layers of BERT using a Lexicon Adapter layer, which employs char–word pair sequences. Thirdly, to enhance feature representation, we propose a lexicon-enhanced contrastive loss layer on top of BERT. Finally, a Conditional Random Field (CRF) layer is employed as the model's decoder. Experimental results show that our proposed model demonstrates superior performance over several mainstream models, achieving a precision of 87.76%, a recall of 86.97%, and an F1-score of 87.36%. The proposed model outperforms BERT-BiLSTM-CRF and LEBERT by 14.05% and 6.8%, respectively, with only 10% of the samples available, showcasing its robustness in data scarcity scenarios. Furthermore, the model exhibits generalizability across publicly available datasets. Our work provides reliable technical support for the information extraction of pig diseases in Chinese and can be easily extended to other domains, thereby facilitating seamless adaptation for named entity identification across diverse contexts.
Subjects: DATA mining; CHINESE language; RANDOM fields; TECHNICAL information; SWINE
Publication: Applied Sciences (2076-3417), 2024, Vol 14, Issue 16, p6944
ISSN: 2076-3417
Publication type: Article
DOI: 10.3390/app14166944

We found a match

A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning.

Peng, Cheng; Wang, Xiajun; Li, Qifeng; Yu, Qinyang; Jiang, Ruixiang; Ma, Weihong; Wu, Wenbiao; Meng, Rui; Li, Haiyan; Huai, Heju; Wang, Shuyan; He, Longjuan

DATA mining; CHINESE language; RANDOM fields; TECHNICAL information; SWINE

Applied Sciences (2076-3417), 2024, Vol 14, Issue 16, p6944

2076-3417

Article

10.3390/app14166944