We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction.
- Authors
Paquot, Magali; Bestgen, Yves
- Abstract
Most studies that make use of keyword analysis rely on log-likelihood ratio or chi-square tests to extract words that are particularly characteristic of a corpus (e.g. Scott and Tribble 2006). These measures are computed on the basis of absolute frequencies and cannot account for the fact that "corpora are inherently variable internally" (Gries 2006: 110). To overcome this limitation, measures of dispersion are sometimes used in combination with keyness values (e.g. Rayson 2003; Oakes and Farrow 2007). Some scholars have also suggested using other statistical measures (e.g. Wilcoxon-Mann- Whitney test) but these techniques have not gained corpus linguists' favour (yet?). One possible explanation for this lack of enthusiasm is that statistical tests for keyword extraction have rarely been compared. In this article, we make use of the log- likelihood ratio, the t-test and the Wilcoxon-Mann-Whitney test in turn to compare the academic and the fiction sub-corpora of the British National Corpus and extract words that are typical of academic discourse. We compare the three lists of academic keywords on a number of criteria (e.g. number of keywords extracted by each measure, percentage of keywords that are shared in the three lists, frequency and distribution of academic keywords in the two corpora) and explore the specificiiles of the three statistical measures. We also assess the advantages and disadvantages of these measures for the extraction of general academic words.
- Subjects
ACADEMIC discourse; CORPORA; SPEECH; KEYWORDS; DISTRIBUTION (Probability theory)
- Publication
Language & Computers, 2009, Vol 68, Issue 1, p247
- ISSN
0921-5034
- Publication type
Article