Aggregate Estimation Over Dynamic Hidden Web Databases
Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

TL;DR
This paper introduces methods for accurately estimating and tracking aggregate data over dynamic hidden web databases, addressing the challenge of frequent updates and strict query limits through innovative algorithms validated by theoretical and real-world experiments.
Contribution
It proposes novel algorithms specifically designed for dynamic hidden web databases, improving upon static database estimation techniques under real-world constraints.
Findings
Algorithms outperform baseline static methods
Effective in tracking database updates over time
Validated through extensive real-world experiments
Abstract
Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Caching and Content Delivery · Data Stream Mining Techniques
