Discovering the Skyline of Web Databases
Abolfazl Asudeh, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das

TL;DR
This paper introduces algorithms for discovering skyline tuples in hidden web databases with top-$k$ constraints, enabling better data insights and third-party applications.
Contribution
It presents novel algorithms tailored for different web interface types and combines them for mixed interfaces, advancing skyline discovery methods.
Findings
Algorithms outperform baseline solutions in efficiency.
Effective across various interface types.
Validated through real-world experiments.
Abstract
Many web databases are "hidden" behind proprietary search interfaces that enforce the top- output constraint, i.e., each query returns at most of all matching tuples, preferentially selected and returned according to a proprietary ranking function. In this paper, we initiate research into the novel problem of skyline discovery over top- hidden web databases. Since skyline tuples provide critical insights into the database and include the top-ranked tuple for every possible ranking function following the monotonic order of attribute values, skyline discovery from a hidden web database can enable a wide variety of innovative third-party applications over one or multiple web databases. Our research in the paper shows that the critical factor affecting the cost of skyline discovery is the type of search interface controls provided by the website. As such, we develop efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Geographic Information Systems Studies
