Optimally Leveraging Density and Locality to Support LIMIT Queries
Albert Kim, Liqi Xu, Tarique Siddiqui, Silu Huang, Samuel Madden,, Aditya Parameswaran

TL;DR
NeedleTail is a novel engine that enables rapid sampling of large datasets for LIMIT queries by using density maps and efficient algorithms, significantly reducing response time and memory usage.
Contribution
The paper introduces NeedleTail, a new approach combining density maps and algorithms to optimize LIMIT query performance in large-scale databases.
Findings
Returns results 4x faster on HDDs and 9x faster on SSDs
Uses up to 23x less memory than existing methods
Provides theoretical guarantees for locating promising data blocks
Abstract
Existing database systems are not optimized for queries with a LIMIT clause---operating instead in an all-or-nothing manner. In this paper, we propose a fast LIMIT query evaluation engine, called NeedleTail, aimed at letting analysts browse a small sample of the query results on large datasets as quickly as possible, independent of the overall size of the result set. NeedleTail introduces density maps, a lightweight in-memory indexing structure, and a set of efficient algorithms (with desirable theoretical guarantees) to quickly locate promising blocks, trading off locality and density. In settings where the samples are used to compute aggregates, we extend techniques from survey sampling to mitigate the bias in our samples. Our experimental results demonstrate that NeedleTail returns results 4x faster on HDDs and 9x faster on SSDs on average, while occupying up to 23x less memory than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Algorithms and Data Compression
