Aggregate Estimation Over Dynamic Hidden Web Databases

Weimo Liu; Saravanan Thirumuruganathan; Nan Zhang; Gautam Das

arXiv:1403.2763·cs.DB·May 2, 2014·1 cites

Aggregate Estimation Over Dynamic Hidden Web Databases

Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

PDF

Open Access

TL;DR

This paper introduces methods for accurately estimating and tracking aggregate data over dynamic hidden web databases, addressing the challenge of frequent updates and strict query limits through innovative algorithms validated by theoretical and real-world experiments.

Contribution

It proposes novel algorithms specifically designed for dynamic hidden web databases, improving upon static database estimation techniques under real-world constraints.

Findings

01

Algorithms outperform baseline static methods

02

Effective in tracking database updates over time

03

Validated through extensive real-world experiments

Abstract

Many databases on the web are "hidden" behind (i.e., accessible only through) their restrictive, form-like, search interfaces. Recent studies have shown that it is possible to estimate aggregate query answers over such hidden web databases by issuing a small number of carefully designed search queries through the restrictive web interface. A problem with these existing work, however, is that they all assume the underlying database to be static, while most real-world web databases (e.g., Amazon, eBay) are frequently updated. In this paper, we study the novel problem of estimating/tracking aggregates over dynamic hidden web databases while adhering to the stringent query-cost limitation they enforce (e.g., at most 1,000 search queries per day). Theoretical analysis and extensive real-world experiments demonstrate the effectiveness of our proposed algorithms and their superiority over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Caching and Content Delivery · Data Stream Mining Techniques