We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A Task-specific Approach for Crawling the Deep Web.
- Authors
Álvarez, Manuel; Raposo, Juan; Cacheda, Fidel; Pan, Alberto
- Abstract
There is a great amount of valuable information on the web that cannot be accessed by conventional crawler engines. This portion of the web is usually known as the Deep Web or the Hidden Web. Most probably, the information of highest value contained in the deep web, is that behind web forms. In this paper, we describe a prototype hidden-web crawler able to access such content. Our approach is based on providing the crawler with a set of domain definitions, each one describing a specific data-collecting task. The crawler uses these descriptions to identify relevant query forms and to learn to execute queries on them. We have tested our techniques for several real world tasks, obtaining a high degree of effectiveness.
- Subjects
PROTOTYPES; ENGINEERING databases; INFORMATION storage &; retrieval systems; ENGINE (Information retrieval system); ENGINEERING
- Publication
Engineering Letters, 2006, Vol 13, Issue 3, p204
- ISSN
1816-093X
- Publication type
Article