Paquot, Magali; Bestgen, Yves

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction.
Authors: Paquot, Magali; Bestgen, Yves
Abstract: Most studies that make use of keyword analysis rely on log-likelihood ratio or chi-square tests to extract words that are particularly characteristic of a corpus (e.g. Scott and Tribble 2006). These measures are computed on the basis of absolute frequencies and cannot account for the fact that "corpora are inherently variable internally" (Gries 2006: 110). To overcome this limitation, measures of dispersion are sometimes used in combination with keyness values (e.g. Rayson 2003; Oakes and Farrow 2007). Some scholars have also suggested using other statistical measures (e.g. Wilcoxon-Mann- Whitney test) but these techniques have not gained corpus linguists' favour (yet?). One possible explanation for this lack of enthusiasm is that statistical tests for keyword extraction have rarely been compared. In this article, we make use of the log- likelihood ratio, the t-test and the Wilcoxon-Mann-Whitney test in turn to compare the academic and the fiction sub-corpora of the British National Corpus and extract words that are typical of academic discourse. We compare the three lists of academic keywords on a number of criteria (e.g. number of keywords extracted by each measure, percentage of keywords that are shared in the three lists, frequency and distribution of academic keywords in the two corpora) and explore the specificiiles of the three statistical measures. We also assess the advantages and disadvantages of these measures for the extraction of general academic words.
Subjects: ACADEMIC discourse; CORPORA; SPEECH; KEYWORDS; DISTRIBUTION (Probability theory)
Publication: Language & Computers, 2009, Vol 68, Issue 1, p247
ISSN: 0921-5034
Publication type: Article

We found a match

Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction.

Paquot, Magali; Bestgen, Yves

ACADEMIC discourse; CORPORA; SPEECH; KEYWORDS; DISTRIBUTION (Probability theory)

Language & Computers, 2009, Vol 68, Issue 1, p247

0921-5034

Article