Label scarcity in biomedicine: Data-rich latent factor discovery   enhances phenotype prediction

Marc-Andre Schulz; Bertrand Thirion; Alexandre Gramfort; Ga\"el; Varoquaux; Danilo Bzdok

arXiv:2110.06135·cs.LG·October 13, 2021·1 cites

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

Marc-Andre Schulz, Bertrand Thirion, Alexandre Gramfort, Ga\"el, Varoquaux, Danilo Bzdok

PDF

Open Access

TL;DR

This paper shows that using low-dimensional embeddings derived from large biobank datasets can significantly improve phenotype prediction in health data-scarce scenarios, outperforming traditional methods like PCA and Isomap.

Contribution

It introduces a novel approach of leveraging Variational Autoencoder manifolds to enhance phenotype prediction in data-scarce biomedical contexts, demonstrating scalability with unlabeled data.

Findings

01

VAE manifolds outperform PCA and Isomap in phenotype prediction

02

Embedding spaces improve prediction accuracy in data-scarce settings

03

Unsupervised learning scales well with increasing unlabeled data

Abstract

High-quality data accumulation is now becoming ubiquitous in the health domain. There is increasing opportunity to exploit rich data from normal subjects to improve supervised estimators in specific diseases with notorious data scarcity. We demonstrate that low-dimensional embedding spaces can be derived from the UK Biobank population dataset and used to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics. Phenotype predictions facilitated by Variational Autoencoder manifolds typically scaled better with increasing unlabeled data than dimensionality reduction by PCA or Isomap. Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies

MethodsPrincipal Components Analysis