TL;DR
This paper introduces a distributed-memory parallel Tucker decomposition method for compressing massive scientific data represented as tensors, achieving high compression ratios with negligible accuracy loss on real-world datasets.
Contribution
It presents the first distributed-memory parallel implementation of Tucker decomposition tailored for large-scale scientific data, with optimized data distribution avoiding data redistribution.
Findings
Achieves compression ratios up to 5000 with minimal accuracy loss.
Demonstrates scalable parallel performance on real-world datasets.
Provides analysis of computation and communication costs.
Abstract
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8~TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 5000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed-memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
