Your institution may have access to this item. Find your institution then sign in to continue.

Title: AGH corpus of Polish speech.
Authors: Żelasko, Piotr; Ziółko, Bartosz; Jadczyk, Tomasz; Skurzok, Dawid
Abstract: A corpus of Polish speech, which has been collected for the purpose of automatic speech recognition (ASR) and text-to-speech (TTS) systems applications, is presented. The corpus consists of several groups of recordings: read sentences, spoken commands, a phonetically balanced TTS training corpus, telephonic speech and others. In summary duration of recordings is above 25 h. Number of unique speakers amounts to 166. The majority of them being in an age group of 20-35 and one third of them being female. Analysis of unique word occurrence frequency in relation to larger text resources has been concluded. From them, most commonly appearing words have been found and presented. The corpus was used as training data for the ASR system. Results of cross-validation training and testing the SARMATA ASR system using our corpus have shown that phrase recognition rate is 91.9 %. The corpus was additionally evaluated in comparative test against the CORPORA corpus, which had shown major increase in phrase recognition rate in favour of our corpus.
Subjects: GENICULATE bodies; AUTOMATIC speech recognition; TEXT-to-speech software; CORPORA; SPEECH perception
Publication: Language Resources & Evaluation, 2016, Vol 50, Issue 3, p585
ISSN: 1574-020X
Publication type: Article
DOI: 10.1007/s10579-015-9302-y

We found a match

AGH corpus of Polish speech.