Fast Query Processing by Distributing an Index over CPU Caches
Xiaoqin Ma, Gene Cooperman

TL;DR
This paper introduces a distributed index approach for cluster-based data access, leveraging network communication to reduce cache misses and improve query processing speed for large tree structures.
Contribution
It proposes splitting a large tree index across cluster nodes and using network messages to access cache-resident parts, reducing cache misses compared to local traversal.
Findings
Distributed index reduces cache misses and improves query speed.
Network-based access often outperforms local cache traversal.
Approach is effective when the index fits in cluster-wide CPU caches.
Abstract
Data intensive applications on clusters often require requests quickly be sent to the node managing the desired data. In many applications, one must look through a sorted tree structure to determine the responsible node for accessing or storing the data. Examples include object tracking in sensor networks, packet routing over the internet, request processing in publish-subscribe middleware, and query processing in database systems. When the tree structure is larger than the CPU cache, the standard implementation potentially incurs many cache misses for each lookup; one cache miss at each successive level of the tree. As the CPU-RAM gap grows, this performance degradation will only become worse in the future. We propose a solution that takes advantage of the growing speed of local area networks for clusters. We split the sorted tree structure among the nodes of the cluster. We assume…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Database Systems and Queries · Neural Networks and Applications
