Automatic Wrapper Adaptation by Tree Edit Distance Matching
Emilio Ferrara, Robert Baumgartner

TL;DR
This paper introduces a novel method for automatically adapting web data extraction wrappers by measuring tree similarity with improved tree edit distance techniques, enhancing robustness against webpage structure changes.
Contribution
The paper proposes a new approach for automatic wrapper adaptation using advanced tree edit distance matching, improving robustness to webpage structural changes.
Findings
Effective adaptation to webpage structure changes
Improved accuracy in wrapper matching
Enhanced robustness of data extraction
Abstract
Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robustness of wrappers, in order not to compromise assets of information or reliability of data extracted. Unfortunately, wrappers may fail in the task of extracting data from a Web page, if its structure changes, sometimes even slightly, thus requiring the exploiting of new techniques to be automatically held so as to adapt the wrapper to the new structure of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
