Simulating High-Dimensional Multivariate Data using the bigsimr R Package
A. Grant Schissler, Edward J. Bedrick, Alexander D. Knudson, Tomasz J., Kozubowski, Tin Nguyen, Anna K. Panorska, Juli Petereit, Walter W. Piegorsch,, Duc Tran

TL;DR
This paper introduces the bigsimr R package for efficient high-dimensional multivariate data simulation with flexible dependency structures, enabling accurate Monte Carlo studies in big data contexts.
Contribution
The paper presents a user-friendly R package that allows high-dimensional data simulation with arbitrary marginals and correlations, leveraging high-performance computing techniques.
Findings
Accurately simulates data up to 10,000 dimensions.
Demonstrates application to breast cancer RNA-sequencing data.
Shows scalability and efficiency of the approach.
Abstract
It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
