Lyu, Dau-Cheng; Tan, Tien-Ping; Chng, Eng-Siong; Li, Haizhou

doi:10.1007/s10579-015-9303-x

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Mandarin-English code-switching speech corpus in South-East Asia: SEAME.
Authors: Lyu, Dau-Cheng; Tan, Tien-Ping; Chng, Eng-Siong; Li, Haizhou
Abstract: This paper introduces the South East Asia Mandarin-English corpus, a 63-h spontaneous Mandarin-English code-switching transcribed speech corpus suitable for LVCSR and language change detection/identification research. The corpus is recorded under unscripted interview and conversational settings from 157 Singaporean and Malaysian speakers who spoke a mixture of Mandarin and English within a single sentence. About 82 % of the transcribed utterances are intra-sentential code-switching speech and the corpus will be release by LDC in 2015. This paper presents an analysis of the code-switching statistics of the corpus, such as the duration of monolingual segments and the frequency of language turns in code-switch utterances. We also summarize the development effort, details such as the processing time for transcription, validation and language boundary labelling. Lastly, we present textual analyses of code-switch segments examining the word length of monolingual segments in code-switch utterances and the most common single word and two-word phrase of such segments.
Subjects: CHINA; MANDARIN dialects -- Study &; teaching; CHINESE dialects; PHONEME (Linguistics); LEXICAL access; WORD recognition; CHINESE language
Publication: Language Resources & Evaluation, 2015, Vol 49, Issue 3, p581
ISSN: 1574-020X
Publication type: Article
DOI: 10.1007/s10579-015-9303-x

We found a match

Mandarin-English code-switching speech corpus in South-East Asia: SEAME.

Lyu, Dau-Cheng; Tan, Tien-Ping; Chng, Eng-Siong; Li, Haizhou

CHINA; MANDARIN dialects -- Study &; teaching; CHINESE dialects; PHONEME (Linguistics); LEXICAL access; WORD recognition; CHINESE language

Language Resources & Evaluation, 2015, Vol 49, Issue 3, p581

1574-020X

Article

10.1007/s10579-015-9303-x