Mixture Envelope Model for Heterogeneous Genomics Data Analysis

Bochao Jia

arXiv:1805.01864·stat.ME·May 7, 2018

Mixture Envelope Model for Heterogeneous Genomics Data Analysis

Bochao Jia

PDF

Open Access

TL;DR

This paper introduces a mixture envelope model that effectively handles heterogeneous genomics data by classifying subgroups and improving prediction accuracy, demonstrated through simulations and breast cancer data analysis.

Contribution

It proposes a novel mixture envelope model with an ICC algorithm for simultaneous classification and regression in heterogeneous data settings.

Findings

01

Outperforms existing methods in classification accuracy.

02

Improves prediction performance in simulated studies.

03

Successfully identifies breast cancer subtypes and gene associations.

Abstract

Envelope model also known as multivariate regression model was proposed to solve the multiple response regression problems. It measures the linear association between predictors and multiple responses by using the minimal reducing subspace of the covariance matrix that accommodates the mean function. However, in many real applications, data may consist many unknown confounding factors or they just come from different resources. Thus, there might be some heterogeneous dependency across the whole population and divide them into different groups. For example, there exists several subtypes across the population with breast cancer with different gene interaction mechanisms for each subtype group. In this setting, constructing a single model using all observations ignores the difference between groups while estimating multiple models for each group is infeasible due to the unknown group…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bayesian Methods and Mixture Models · Algorithms and Data Compression