Meta-analysis of heterogeneous data: integrative sparse regression in high-dimensions
Subha Maity, Yuekai Sun, and Moulinath Banerjee

TL;DR
This paper introduces a novel integrative sparse regression method for high-dimensional heterogeneous data in meta-analysis, improving interpretability, efficiency, and predictive performance across diverse datasets.
Contribution
It proposes a global parameter and a one-shot estimator that adapt to heterogeneity and data size, enhancing meta-analysis in high-dimensional settings.
Findings
Outperforms existing methods in adapting to data heterogeneity.
Demonstrates improved prediction accuracy on cancer cell-line datasets.
Provides a convergent estimator that preserves data source anonymity.
Abstract
We consider the task of meta-analysis in high-dimensional settings in which the data sources are similar but non-identical. To borrow strength across such heterogeneous datasets, we introduce a global parameter that emphasizes interpretability and statistical efficiency in the presence of heterogeneity. We also propose a one-shot estimator of the global parameter that preserves the anonymity of the data sources and converges at a rate that depends on the size of the combined dataset. For high-dimensional linear model settings, we demonstrate the superiority of our identification restrictions in adapting to a previously seen data distribution as well as predicting for a new/unseen data distribution. Finally, we demonstrate the benefits of our approach on a large-scale drug treatment dataset involving several different cancer cell-lines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Machine Learning and Data Classification
