We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Exploiting fuzzy tree fragment queries in the investigation of parsed corpora.
- Authors
Wallis, S; Nelson, G
- Abstract
The production of collections of grammatically analysed linguistic samples, or corpora, entails a parallel research effort in tools for exploiting such collections. This paper summarizes the new one million-word parsed British Component of the International Corpus of English (ICE-GB). We then introduce a tool designed annotating, managing, and investigating the parsed corpus, called ICECUP III. Effective grammatical exploration is driven by a query. We therefore pose the question, what is the optimum representation for expressing queries on a parsed corpus? We review existing grammatical query systems and a feasible representation in logic. We describe ICECUP's query representation, the Fuzzy Tree Fragment (FTF), which has proven extremely effective in supporting the manual correction of automatically annotated material in ICE-GB and is currently used for linguistic research. Central to the idea of FTFs is that they should be as easy to understand as possible and support researchers as they engage with the complexity of the parsed analysis. An FTF, therefore, is a kind of 'abstracted model' of a tree. We argue that FTFs are highly applicable to general linguistic research and teaching with grammatically analysed corpora. They support an exploratory approach that does not presume a detailed knowledge of the grammar in advance of search. Exploration is supported by a number of facilities, including a tool that abstracts a query from part of a corpus tree. Finally, we discuss how FTFs may be used to perform 'key construction in line' concordancing, and the use of logic to combine queries.
- Subjects
FUZZY languages; CORPORA; PARSING (Grammar); ENGLISH language -- Comparison; ENGLISH language education
- Publication
Literary & Linguistic Computing, 2000, Vol 15, Issue 3, p339
- ISSN
0268-1145
- Publication type
Article
- DOI
10.1093/llc/15.3.339