We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights.
- Authors
Davies, Mark
- Abstract
The Corpus of Contemporary American English (COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year), balanced between spoken, fiction, popular magazines, newspapers, and academic journals. We also discuss the unique relational databases architecture, which allows for a wide range of queries that are not available (or are quite difficult) with other architectures and interfaces. To conclude, we consider insights from the corpus on a number of cases of genre-based variation and recent linguistic variation, including an extended analysis of phrasal verbs in contemporary American English.
- Subjects
AMERICAN English language; PUBLISHING; INFORMATION storage &; retrieval systems; DATABASES; VERBS; PHRASEOLOGY; LANGUAGE &; languages; ENGLISH language databases
- Publication
International Journal of Corpus Linguistics, 2009, Vol 14, Issue 2, p159
- ISSN
1384-6655
- Publication type
Article
- DOI
10.1075/ijcl.14.2.02dav