We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
InfoXtract: A customizable intermediate level information extraction engine.
- Authors
ROHINI K. SRIHARI; WEI LI; THOMAS CORNELL; CHENG NIU
- Abstract
AbstractInformation Extraction (IE) systems assist analysts to assimilate information from electronic documents. This paper focuses on IE tasks designed to support information discoveryapplications. Since information discovery implies examining large volumes of heterogeneous documents for situations that cannot be anticipated a priori, they require IE systems to have breadth as well as depth. This implies the need for a domain-independent IE system that can easily be customized for specific domains: end users must be given tools to customize the system on their own. It also implies the need for defining new intermediatelevel IE tasks that are richer than the subject-verb-object (SVO) triples produced by shallow systems, yet not as complex as the domain-specific scenarios defined by the Message Understanding Conference (MUC). This paper describes InfoXtract, a robust, scalable, intermediate-level IE engine that can be ported to various domains. It describes new IE tasks such as synthesis of entity profiles, and extraction of concept-based general events which represent realistic near-term goals focused on deriving useful, actionable information. Entity profiles consolidate information about a person/organization/location etc. within a document and across documents into a single template; this takes into account aliases and anaphoric references as well as key relationships and events pertaining to that entity. Concept-based events attempt to normalize information such as time expressions (e.g., yesterday) as well as ambiguous location references (e.g., Buffalo). These new tasks facilitate the correlation of output from an IE engine with structured data to enable text mining. InfoXtract's hybrid architecture comprised of grammatical processing and machine learning is described in detail. Benchmarking results for the core engine and applications utilizing the engine are presented.
- Subjects
DATA mining; ELECTRONIC records; INFORMATION storage &; retrieval systems; ENGINEERING databases; ENGINE (Information retrieval system); MACHINE learning
- Publication
Natural Language Engineering, 2008, Vol 14, Issue 1, p33
- ISSN
1351-3249
- Publication type
Article
- DOI
10.1017/S1351324906004116