We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Interactive learning of node selecting tree transducer.
- Authors
Julien Carme; Rémi Gilleron; Aurélien Lemay; Joachim Niehren
- Abstract
<div class="abstract"><a name="abs1"/><span class="abstractheading">Abstract??</span>We develop new algorithms for learning monadic node selection queries in unranked trees from annotated examples, and apply them to visually interactive Web information extraction.<div class="abstractpara"><div class="">We propose to represent monadic queries by bottom-up deterministic Node Selecting Tree Transducers (NSTTs), a particular class of tree automata that we introduce. We prove that deterministic NSTTs capture the class of queries definable in monadic second order logic (MSO) in trees, which Gottlob and Koch (2002) argue to have the right expressiveness for Web information extraction, and prove that monadic queries defined by NSTTs can be answered efficiently. We present a new polynomial time algorithm inRPNI-style that learns monadic queries defined by deterministic NSTTs from completely annotated examples, where all selected nodes are distinguished.</div></div><div class="abstractpara"><div class="">In practice, users prefer to provide partial annotations. We propose to account for partial annotations by intelligent tree pruning heuristics. We introducepruningNSTTs?a formalism that shares many advantages of NSTTs. This leads us to an interactive learning algorithm for monadic queries defined by pruning NSTTs, which satisfies a new formal active learning model in the style of Angluin (1987).</div></div><div class="abstractpara"><div class="">We have implemented our interactive learning algorithm integrated it into a visually interactive Web information extraction system?calledSQUIRREL?by plugging it into the Mozilla Web browser. Experiments on realistic Web documents confirm excellent quality with very few user interactions during wrapper induction.</div></div></div>
- Subjects
LEARNING; TRANSDUCERS; INTERNET in education; WEB browsers
- Publication
Machine Learning, 2007, Vol 66, Issue 1, p33
- ISSN
0885-6125
- Publication type
Article
- DOI
10.1007/s10994-006-9613-8