Tran, Hanh Thi Hong; Martinc, Matej; Repar, Andraz; Ljubešić, Nikola; Doucet, Antoine; Pollak, Senja

doi:10.1007/s10994-023-06506-7

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?
Authors: Tran, Hanh Thi Hong; Martinc, Matej; Repar, Andraz; Ljubešić, Nikola; Doucet, Antoine; Pollak, Senja
Abstract: Automatic term extraction (ATE) is a natural language processing task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. In this paper, we treat ATE as a sequence-labeling task and explore the efficacy of XLMR in evaluating cross-lingual and multilingual learning against monolingual learning in the cross-domain ATE context. Additionally, we introduce NOBI, a novel annotation mechanism enabling the labeling of single-word nested terms. Our experiments are conducted on the ACTER corpus, encompassing four domains and three languages (English, French, and Dutch), as well as the RSDO5 Slovenian corpus, encompassing four additional domains. Results indicate that cross-lingual and multilingual models outperform monolingual settings, showcasing improved F1-scores for all languages within the ACTER dataset. When incorporating an additional Slovenian corpus into the training set, the multilingual model exhibits superior performance compared to state-of-the-art approaches in specific scenarios. Moreover, the newly introduced NOBI labeling mechanism enhances the classifier's capacity to extract short nested terms significantly, leading to substantial improvements in Recall for the ACTER dataset and consequentially boosting the overall F1-score performance.
Subjects: NATURAL language processing; LINGUISTIC context; ENGLISH language; FRENCH language; DUTCH language
Publication: Machine Learning, 2024, Vol 113, Issue 7, p4285
ISSN: 0885-6125
Publication type: Article
DOI: 10.1007/s10994-023-06506-7

We found a match

Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?

Tran, Hanh Thi Hong; Martinc, Matej; Repar, Andraz; Ljubešić, Nikola; Doucet, Antoine; Pollak, Senja

NATURAL language processing; LINGUISTIC context; ENGLISH language; FRENCH language; DUTCH language

Machine Learning, 2024, Vol 113, Issue 7, p4285

0885-6125

Article

10.1007/s10994-023-06506-7