Integrative Learning of Structured High-Dimensional Data from Multiple Datasets
Changgee Chang, Zongyu Dai, Jihwan Oh, Qi Long

TL;DR
This paper introduces a novel integrative learning method for high-dimensional data from multiple datasets, leveraging known feature graphs to improve feature selection, especially when signals are weak or heterogeneous across datasets.
Contribution
The proposed approach effectively combines prior feature graph information to enhance signal detection and address heterogeneity, outperforming existing methods.
Findings
Method improves detection of weak signals in heterogeneous data
Theoretical analysis confirms consistency and robustness
Application to gene expression data demonstrates practical advantages
Abstract
Integrative learning of multiple datasets has the potential to mitigate the challenge of small and large that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be enhanced by jointly selecting features for all datasets. However, the set of important features may not always be the same across all datasets. Although some existing integrative learning methods allow heterogeneous sparsity structure where a subset of datasets can have zero coefficients for some selected features, they tend to yield reduced efficiency, reinstating the problem of losing weak important signals. We propose a new integrative learning approach which can not only aggregate important signals well in homogeneous sparsity structure, but also substantially alleviate the problem of losing weak important signals in heterogeneous sparsity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Face and Expression Recognition · Machine Learning and Data Classification
