We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Towards a Historical Treebank of Middle and Early Modern Welsh, Part I: Workflow and POS Tagging.
- Authors
Meelen, Marieke; Willis, David
- Abstract
This article introduces the working methods of the Parsed Historical Corpus of the Welsh Language (PARSHCWL). The corpus is designed to provide researchers with a tool for automatic exhaustive extraction of instances of grammatical structures from Middle and Modern Welsh texts in a way comparable to similar tools that already exist for various European languages. The major features of the corpus are outlined, along with the overall architecture of the workflow needed for a team of researchers to produce it. In this paper, the two first stages of the process, namely pre-processing of texts and automated part-of-speech (POS) tagging are discussed in some detail, focusing in particular on major issues involved in defining word boundaries and in defining a robust and useful tagset.
- Subjects
CORPORA; WELSH language; WORKFLOW; HISTORICAL linguistics; ROMANCE languages
- Publication
Journal of Celtic Linguistics, 2021, Vol 22, p125
- ISSN
0962-1377
- Publication type
Article
- DOI
10.16922/jcl.22.6