We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
The automatic generation of thesauri of related words for English, French, German, and Russian.
- Authors
Rapp, Reinhard
- Abstract
A method for the automatic extraction of words with similar meanings is presented which is based on the analysis of word distribution in large monolingual text corpora. It involves compiling matrices of word co-occurrences and reducing the dimensionality of the semantic space by conducting a singular value decomposition. This way problems of data sparseness are reduced and a generalization effect is achieved which considerably improves the results. The method is largely language independent and has been applied to corpora of English, French, German, and Russian, with the resulting thesauri being freely available. For the English thesaurus, an evaluation has been conducted by comparing it to experimental results as obtained from test persons who were asked to give judgements of word similarities. According to this evaluation, the machine generated results come close to native speaker’s performance.
- Subjects
THESAURI; VOCABULARY; CORPORA; GERMAN vocabulary; FRENCH vocabulary; RUSSIAN language; ENGLISH language
- Publication
International Journal of Speech Technology, 2008, Vol 11, Issue 3/4, p147
- ISSN
1381-2416
- Publication type
Article
- DOI
10.1007/s10772-009-9043-7