We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon-Enhanced BERT and Contrastive Learning.
- Authors
Peng, Cheng; Wang, Xiajun; Li, Qifeng; Yu, Qinyang; Jiang, Ruixiang; Ma, Weihong; Wu, Wenbiao; Meng, Rui; Li, Haiyan; Huai, Heju; Wang, Shuyan; He, Longjuan
- Abstract
Featured Application: Our work provides reliable technical support for the information extraction of pig diseases in Chinese. It can be applied toother domain-specific fields, thereby facilitating seamless adaptation for named entity identification across diverse contexts. Named Entity Recognition (NER) is a fundamental and pivotal stage in the development of various knowledge-based support systems, including knowledge retrieval and question-answering systems. In the domain of pig diseases, Chinese NER models encounter several challenges, such as the scarcity of annotated data, domain-specific vocabulary, diverse entity categories, and ambiguous entity boundaries. To address these challenges, we propose PDCNER, a Pig Disease Chinese Named Entity Recognition method leveraging lexicon-enhanced BERT and contrastive learning. Firstly, we construct a domain-specific lexicon and pre-train word embeddings in the pig disease domain. Secondly, we integrate lexicon information of pig diseases into the lower layers of BERT using a Lexicon Adapter layer, which employs char–word pair sequences. Thirdly, to enhance feature representation, we propose a lexicon-enhanced contrastive loss layer on top of BERT. Finally, a Conditional Random Field (CRF) layer is employed as the model's decoder. Experimental results show that our proposed model demonstrates superior performance over several mainstream models, achieving a precision of 87.76%, a recall of 86.97%, and an F1-score of 87.36%. The proposed model outperforms BERT-BiLSTM-CRF and LEBERT by 14.05% and 6.8%, respectively, with only 10% of the samples available, showcasing its robustness in data scarcity scenarios. Furthermore, the model exhibits generalizability across publicly available datasets. Our work provides reliable technical support for the information extraction of pig diseases in Chinese and can be easily extended to other domains, thereby facilitating seamless adaptation for named entity identification across diverse contexts.
- Subjects
DATA mining; CHINESE language; RANDOM fields; TECHNICAL information; SWINE
- Publication
Applied Sciences (2076-3417), 2024, Vol 14, Issue 16, p6944
- ISSN
2076-3417
- Publication type
Article
- DOI
10.3390/app14166944