Scalable Streaming Tools for Analyzing $N$-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass
Nikita Ivkin, Zaoxing Liu, Lin F. Yang, Srinivas Suresh Kumar, Gerard, Lemson, Mark Neyrinck, Alexander S. Szalay, Vladimir Braverman, Tamas, Budavari

TL;DR
This paper introduces a scalable GPU-accelerated streaming tool for analyzing extremely large cosmological N-body simulation datasets, enabling efficient halo detection and statistical analysis on datasets with up to 10^12 particles.
Contribution
It presents a novel, high-performance streaming algorithm leveraging GPU, sampling, and parallel I/O to analyze large-scale N-body simulations beyond previous limits.
Findings
Scales to datasets with up to 10^12 particles
Detects 10^4-10^5 halo centers efficiently
Operates within an hour on a single GPU
Abstract
Cosmological -body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve performance and scalability. Our rigorous analysis of the sketch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
