External Memory Pipelining Made Easy With TPIE
Lars Arge, Mathias Rav, Svend C. Svendsen, Jakob Truelsen

TL;DR
This paper introduces an extension to the TPIE library that simplifies the development of I/O-efficient external memory algorithms by enabling pipelining and parallelization, reducing disk I/O overhead and improving performance.
Contribution
The paper presents a major extension to TPIE that adds pipelining, parallelization, and automatic memory management for more efficient external memory algorithm implementation.
Findings
Pipelining reduces I/O overhead in external memory algorithms.
The extended TPIE library supports parallel internal memory computation.
The framework has been successfully used in research and commercial applications.
Abstract
When handling large datasets that exceed the capacity of the main memory, movement of data between main memory and external memory (disk), rather than actual (CPU) computation time, is often the bottleneck in the computation. Since data is moved between disk and main memory in large contiguous blocks, this has led to the development of a large number of I/O-efficient algorithms that minimize the number of such block movements. TPIE is one of two major libraries that have been developed to support I/O-efficient algorithm implementations. TPIE provides an interface where list stream processing and sorting can be implemented in a simple and modular way without having to worry about memory management or block movement. However, if care is not taken, such streaming-based implementations can lead to practically inefficient algorithms since lists of data items are typically written to (and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
