Relationship-aware Multivariate Sampling Strategy for Scientific Simulation Data
Subhashis Hazarika, Ayan Biswas, Phillip J. Wolfram, Earl Lawrence,, Nathan Urban

TL;DR
This paper introduces a multivariate sampling strategy for scientific simulation data that preserves variable relationships, enabling more accurate post-hoc analyses and data reduction compared to traditional univariate methods.
Contribution
It proposes a novel multivariate sampling approach using PCA that can be integrated with existing algorithms and includes data partitioning variants for local relationship modeling.
Findings
Effective data reduction demonstrated on real-world datasets
Preserves multivariate relationships for subsequent analysis
Facilitates scalable post-hoc multivariate analysis
Abstract
With the increasing computational power of current supercomputers, the size of data produced by scientific simulations is rapidly growing. To reduce the storage footprint and facilitate scalable post-hoc analyses of such scientific data sets, various data reduction/summarization methods have been proposed over the years. Different flavors of sampling algorithms exist to sample the high-resolution scientific data, while preserving important data properties required for subsequent analyses. However, most of these sampling algorithms are designed for univariate data and cater to post-hoc analyses of single variables. In this work, we propose a multivariate sampling strategy which preserves the original variable relationships and enables different multivariate analyses directly on the sampled data. Our proposed strategy utilizes principal component analysis to capture the variance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
