Airphant: Cloud-oriented Document Indexing
Supawit Chockchowwat, Chaitanya Sood, Yongjoo Park

TL;DR
Airphant introduces IoU Sketch, a novel statistical index that enables cloud-based search engines to achieve significantly lower query latencies by parallelizing network requests, demonstrated by substantial speed improvements over existing solutions.
Contribution
The paper presents IoU Sketch, a new index structure that reduces cloud search latency, and the Airphant system that leverages it for efficient keyword search in cloud environments.
Findings
Airphant achieves up to 8.97x faster query latency than Apache Lucene.
IoU Sketch reduces index lookup time through parallel asynchronous requests.
Experiments show Airphant's end-to-end latency between 13ms and 300ms.
Abstract
Modern data warehouses can scale compute nodes independently of storage. These systems persist their data on cloud storage, which is always available and cost-efficient. Ad-hoc compute nodes then fetch necessary data on-demand from cloud storage. This ability to quickly scale or shrink data systems is highly beneficial if query workloads may change over time. We apply this new architecture to search engines with a focus on optimizing their latencies in cloud environments. However, simply placing existing search engines (e.g., Apache Lucene) on top of cloud storage significantly increases their end-to-end query latencies (i.e., more than 6 seconds on average in one of our studies). This is because their indexes can incur multiple network round-trips due to their hierarchical structure (e.g., skip lists, B-trees, learned indexes). To address this issue, we develop a new statistical index…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Data Quality and Management · Cloud Computing and Resource Management
