Towards IID representation learning and its application on biomedical   data

Jiqing Wu; Inti Zlobec; Maxime Lafarge; Yukun He; Viktor H. Koelzer

arXiv:2203.00332·cs.LG·March 2, 2022·1 cites

Towards IID representation learning and its application on biomedical data

Jiqing Wu, Inti Zlobec, Maxime Lafarge, Yukun He, Viktor H. Koelzer

PDF

Open Access 1 Repo

TL;DR

This paper introduces IID representation learning as a fundamental approach to improve out-of-distribution generalization in biomedical data by learning task-relevant functions that induce IID among transformed data.

Contribution

It proposes a novel IID representation learning framework and demonstrates its effectiveness on biomedical OOD tasks, outperforming state-of-the-art methods.

Findings

01

Superior OOD generalization performance on biomedical datasets.

02

Effective induction of IID representations improves robustness.

03

Reproducible code available for benchmarking and further research.

Abstract

Due to the heterogeneity of real-world data, the widely accepted independent and identically distributed (IID) assumption has been criticized in recent studies on causality. In this paper, we argue that instead of being a questionable assumption, IID is a fundamental task-relevant property that needs to be learned. Consider $k$ independent random vectors $X^{i = 1, \dots, k}$ , we elaborate on how a variety of different causal questions can be reformulated to learning a task-relevant function $ϕ$ that induces IID among $Z^{i} := ϕ \circ X^{i}$ , which we term IID representation learning. For proof of concept, we examine the IID representation learning on Out-of-Distribution (OOD) generalization tasks. Concretely, by utilizing the representation obtained via the learned function that induces IID, we conduct prediction of molecular characteristics (molecular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ctplab/iid_representation_learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Health, Environment, Cognitive Aging · Bayesian Modeling and Causal Inference