Bridging the Generalization Gap: Training Robust Models on Confounded   Biological Data

Tzu-Yu Liu; Ajay Kannan; Adam Drake; Marvin Bertin; Nathan Wan

arXiv:1812.04778·cs.LG·December 13, 2018·1 cites

Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data

Tzu-Yu Liu, Ajay Kannan, Adam Drake, Marvin Bertin, Nathan Wan

PDF

Open Access

TL;DR

This paper presents methods to improve the generalization of biological data models by controlling confounders using normalization and adversarial training, demonstrated on simulated and real patient data.

Contribution

It introduces ONION normalization and DANN adversarial training to effectively reduce confounding effects in biological data modeling.

Findings

01

Significant improvement in model generalization on simulated data

02

Enhanced prediction accuracy on empirical patient data

03

Effective removal of confounder influence in biological datasets

Abstract

Statistical learning on biological data can be challenging due to confounding variables in sample collection and processing. Confounders can cause models to generalize poorly and result in inaccurate prediction performance metrics if models are not validated thoroughly. In this paper, we propose methods to control for confounding factors and further improve prediction performance. We introduce OrthoNormal basis construction In cOnfounding factor Normalization (ONION) to remove confounding covariates and use the Domain-Adversarial Neural Network (DANN) to penalize models for encoding confounder information. We apply the proposed methods to simulated and empirical patient data and show significant improvements in generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning