Larger than memory image processing
Jon Sporring, David Stansby

TL;DR
This paper presents a streaming architecture and domain-specific language for efficient analysis of extremely large images that are larger than available memory, focusing on minimizing disk I/O and optimizing processing pipelines.
Contribution
It introduces a novel streaming-based approach and a DSL that automatically optimizes large-scale image analysis pipelines for limited-memory systems.
Findings
Achieves near-linear I/O scans for petascale datasets.
Minimizes redundant disk access compared to chunked layouts.
Provides predictable memory footprints and substantial throughput gains.
Abstract
This report addresses larger-than-memory image analysis for petascale datasets such as 1.4 PB electron-microscopy volumes and 150 TB human-organ atlases. We argue that performance is fundamentally I/O-bound. We show that structuring analysis as streaming passes over data is crucial. For 3D volumes, two representations are popular: stacks of 2D slices (e.g., directories or multi-page TIFF) and 3D chunked layouts (e.g., Zarr/HDF5). While for a few algorithms, chunked layout on disk is crucial to keep disk I/O at a minimum, we show how the slice-based streaming architecture can be built on top of either image representation in a manner that minimizes disk I/O. This is in particular advantageous for algorithms relying on neighbouring values, since the slicing streaming architecture is 1D, which implies that there are only 2 possible sweeping orders, both of which are aligned with the order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Advanced Electron Microscopy Techniques and Applications · Scientific Computing and Data Management
