Parallel Joinable B-Trees in the Fork-Join I/O Model
Michael Goodrich, Yan Gu, Ryuto Kitagawa, Yihan Sun

TL;DR
This paper introduces a new Fork-Join I/O Model to analyze and optimize I/O costs in parallel set operations on B-trees, achieving provable efficiency bounds for union, intersection, and difference operations.
Contribution
It proposes a novel I/O-efficient parallel algorithm for B-tree set operations within a new computational model, providing tight bounds on I/O costs.
Findings
Achieves $O(m \, \log_B(n/m))$ I/O work for set operations.
Provides bounds on I/O span: $O(\log_B m \cdot \log_2 \log_B n + \log_B n)$.
Introduces the Fork-Join I/O Model for analyzing parallel I/O costs.
Abstract
Balanced search trees are widely used in computer science to efficiently maintain dynamic ordered data. To support efficient set operations (e.g., union, intersection, difference) using trees, the join-based framework is widely studied. This framework has received particular attention in the parallel setting, and has been shown to be effective in enabling simple and theoretically efficient set operations on trees. Despite the widespread adoption of parallel join-based trees, a major drawback of previous work on such data structures is the inefficiency of their input/output (I/O) access patterns. Some recent work (e.g., C-trees and PaC-trees) focused on more I/O-friendly implementations of these algorithms. Surprisingly, however, there have been no results on bounding the I/O-costs for these algorithms. It remains open whether these algorithms can provide tight, provable guarantees in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
