We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Creating a system for lexical substitutions from scratch using crowdsourcing.
- Authors
Biemann, Chris
- Abstract
This article describes the creation and application of the Turk Bootstrap Word Sense Inventory for 397 frequent nouns, which is a publicly available resource for lexical substitution. This resource was acquired using Amazon Mechanical Turk. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then, more contexts are collected. Contexts that cannot be assigned to a current target word's sense inventory re-enter the bootstrapping loop and get a supply of substitutions. This process yields a sense inventory with its granularity determined by substitutions as opposed to psychologically motivated concepts. It comes with a large number of sense-annotated target word contexts. Evaluation on data quality shows that the process is robust against noise from the crowd, produces a less fine-grained inventory than WordNet and provides a rich body of high precision substitution data at low cost. Using the data to train a system for lexical substitutions, we show that amount and quality of the data is sufficient for producing high quality substitutions automatically. In this system, co-occurrence cluster features are employed as a means to cheaply model topicality.
- Subjects
LEXICOLOGY; SUBSTITUTION (Linguistics); CROWDSOURCING; WORD (Linguistics); LINGUISTICS
- Publication
Language Resources & Evaluation, 2013, Vol 47, Issue 1, p97
- ISSN
1574-020X
- Publication type
Article
- DOI
10.1007/s10579-012-9180-5