The Performance Envelope of Inverted Indexing on Modern Hardware
Jimmy Lin, Lori Paniak, and Gordon Boerke

TL;DR
This study evaluates the performance limits of inverted indexing on modern hardware, revealing that physical media characteristics dominate throughput and that current techniques are nearing physical device constraints.
Contribution
It provides a comprehensive benchmarking of inverted indexing performance across various hardware configurations, highlighting physical media as the primary bottleneck and suggesting the need for rethinking indexing pipelines.
Findings
Physical media characteristics are the main performance determinants.
Isolating source and target media maximizes indexing throughput.
Current indexing techniques are approaching physical device limits.
Abstract
This paper explores the performance envelope of "traditional" inverted indexing on modern hardware using the implementation in the open-source Lucene search library. We benchmark indexing throughput on a single high-end multi-core commodity server in a number of configurations varying the media of the source collection and target index, examining a network-attached store, a direct-attached disk array, and an SSD. Experiments show that the largest determinants of performance are the physical characteristics of the source and target media, and that physically isolating the two yields the highest indexing throughput. Results suggest that current indexing techniques have reached physical device limits, and that further algorithmic improvements in performance are unlikely without rethinking the inverted indexing pipeline in light of observed bottlenecks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPeer-to-Peer Network Technologies · Caching and Content Delivery · Data Management and Algorithms
MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD
