We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
A Scenario-Generic Neural Machine Translation Data Augmentation Method.
- Authors
Liu, Xiner; He, Jianshu; Liu, Mingzhe; Yin, Zhengtong; Yin, Lirong; Zheng, Wenfeng
- Abstract
Amid the rapid advancement of neural machine translation, the challenge of data sparsity has been a major obstacle. To address this issue, this study proposes a general data augmentation technique for various scenarios. It examines the predicament of parallel corpora diversity and high quality in both rich- and low-resource settings, and integrates the low-frequency word substitution method and reverse translation approach for complementary benefits. Additionally, this method improves the pseudo-parallel corpus generated by the reverse translation method by substituting low-frequency words and includes a grammar error correction module to reduce grammatical errors in low-resource scenarios. The experimental data are partitioned into rich- and low-resource scenarios at a 10:1 ratio. It verifies the necessity of grammatical error correction for pseudo-corpus in low-resource scenarios. Models and methods are chosen from the backbone network and related literature for comparative experiments. The experimental findings demonstrate that the data augmentation approach proposed in this study is suitable for both rich- and low-resource scenarios and is effective in enhancing the training corpus to improve the performance of translation tasks.
- Subjects
MACHINE translating; DATA augmentation; COMPARATIVE literature; TASK performance; CORPORA; ERROR correction (Information theory)
- Publication
Electronics (2079-9292), 2023, Vol 12, Issue 10, p2320
- ISSN
2079-9292
- Publication type
Article
- DOI
10.3390/electronics12102320