We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation.
- Authors
Bawden, Rachel; Bilinski, Eric; Lavergne, Thomas; Rosset, Sophie
- Abstract
We present a new English–French dataset for the evaluation of Machine Translation (MT) for informal, written bilingual dialogue. The test set contains 144 spontaneous dialogues (5700+ sentences) between native English and French speakers, mediated by one of two neural MT systems in a range of role-play settings. The dialogues are accompanied by fine-grained sentence-level judgments of MT quality, produced by the dialogue participants themselves, as well as by manually normalised versions and reference translations produced a posteriori. The motivation for the corpus is twofold: to provide (i) a unique resource for evaluating MT models, and (ii) a corpus for the analysis of MT-mediated communication. We provide an initial analysis of the corpus to confirm that the participants' judgments reveal perceptible differences in MT quality between the two MT systems used.
- Subjects
MACHINE translating; CORPORA; TRANSLATING &; interpreting
- Publication
Language Resources & Evaluation, 2021, Vol 55, Issue 3, p635
- ISSN
1574-020X
- Publication type
Article
- DOI
10.1007/s10579-020-09514-4