We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
AUTOMATIC PATTERN CONSTRUCTION FOR WEB INFORMATION EXTRACTION.
- Authors
GAO, XIAOYING; ZHANG, MENGJIE; ANDREAE, PETER
- Abstract
This paper describes a domain independent approach for automatically constructing information extraction patterns for semi-structured web pages. Given a randomly chosen page from a web site of similarly structured pages, the system identifies a region of the page that has a regular "tabular" structure, and then infers an extraction pattern that will match the "rows" of the region and identify the data elements. The approach was tested on three corpora containing a series of tabular web sites from different domains and achieved a success rate of at least 80%. A significant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.
- Subjects
INFORMATION technology; WEBSITES; ARTIFICIAL intelligence; SOFTWARE shells; COMPUTER software; LABELING-machines
- Publication
International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems, 2004, Vol 12, Issue 4, p447
- ISSN
0218-4885
- Publication type
Article
- DOI
10.1142/S0218488504002928