We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Abstractive Summarization of Broadcast News Stories for Estonian.
- Authors
HÄRM, Henry; ALUMÄE, Tanel
- Abstract
We present an approach for generating abstractive summaries for Estonian spoken news stories in a low-resource setting. Given a recording of a radio news story, the goal is to create a summary that captures the essential information in a short format. The approach consists of two steps: automatically generating the transcript and applying a state-of-the-art text summarization system to generate the result. We evaluated a number of models, with the best-performing model leveraging the large English BART model pre-trained on CNN/DailyMail dataset and fine-tuned on machine-translated in-domain data, and with the test data translated to English and back. The method achieved a ROUGE-1 score of 17.22, improving on the alternatives and achieving the best result in human evaluation. The applicability of the proposed solution might be limited in languages where machine translation systems are not mature. In such cases multilingual BART should be considered, which achieved a ROUGE-1 score of 17.00 overall and a score of 16.22 without machine translation based data augmentation.
- Subjects
BROADCAST journalism; DAILY Mail (Newspaper); MACHINE translating; DATA augmentation; DATABASES
- Publication
Baltic Journal of Modern Computing, 2022, Vol 10, Issue 3, p511
- ISSN
2255-8942
- Publication type
Article
- DOI
10.22364/bjmc.2022.10.3.23