Meta-analysis of heterogeneous data: integrative sparse regression in   high-dimensions

Subha Maity; Yuekai Sun; and Moulinath Banerjee

arXiv:1912.11928·stat.ME·July 1, 2022·J. Mach. Learn. Res.·5 cites

Meta-analysis of heterogeneous data: integrative sparse regression in high-dimensions

Subha Maity, Yuekai Sun, and Moulinath Banerjee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel integrative sparse regression method for high-dimensional heterogeneous data in meta-analysis, improving interpretability, efficiency, and predictive performance across diverse datasets.

Contribution

It proposes a global parameter and a one-shot estimator that adapt to heterogeneity and data size, enhancing meta-analysis in high-dimensional settings.

Findings

01

Outperforms existing methods in adapting to data heterogeneity.

02

Demonstrates improved prediction accuracy on cancer cell-line datasets.

03

Provides a convergent estimator that preserves data source anonymity.

Abstract

We consider the task of meta-analysis in high-dimensional settings in which the data sources are similar but non-identical. To borrow strength across such heterogeneous datasets, we introduce a global parameter that emphasizes interpretability and statistical efficiency in the presence of heterogeneity. We also propose a one-shot estimator of the global parameter that preserves the anonymity of the data sources and converges at a rate that depends on the size of the combined dataset. For high-dimensional linear model settings, we demonstrate the superiority of our identification restrictions in adapting to a previously seen data distribution as well as predicting for a new/unseen data distribution. Finally, we demonstrate the benefits of our approach on a large-scale drug treatment dataset involving several different cancer cell-lines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smaityumich/mrlasso
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gene expression and cancer classification · Machine Learning and Data Classification