Robust structured heterogeneity analysis approach for high-dimensional   data

Yifan Sun; Ziye Luo; Xinyan Fan

arXiv:2211.16475·stat.ME·November 30, 2022

Robust structured heterogeneity analysis approach for high-dimensional data

Yifan Sun, Ziye Luo, Xinyan Fan

PDF

TL;DR

This paper introduces a robust method for analyzing high-dimensional biomedical data to identify disease subgroups and important genes, effectively handling data contamination and gene interconnections, with demonstrated superior performance.

Contribution

It develops a novel robust structured heterogeneity analysis approach combining Huber loss and overlapping group lasso for better subgroup and gene identification.

Findings

01

Outperforms existing methods in simulations

02

Reveals meaningful subgroups in cancer data

03

Improves prediction and stability

Abstract

Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.