METCC: METric learning for Confounder Control Making distance matter in high dimensional biological analysis
Kabir Manghnani, Adam Drake, Nathan Wan, and Imran Haque

TL;DR
METCC employs contrastive metric learning with a triplet network to effectively distinguish biological signals in high-dimensional genomic data while reducing confounding effects from technical and irrelevant biological variations, even without detailed metadata.
Contribution
This work introduces a non-linear contrastive metric learning approach, METCC, that improves confounder control in biological data analysis beyond traditional linear methods, especially when metadata is limited.
Findings
METCC matches or exceeds linear methods in classification performance.
METCC reduces confounding from technical variables like batch and institution.
The approach performs well without requiring detailed metadata.
Abstract
High-dimensional data acquired from biological experiments such as next generation sequencing are subject to a number of confounding effects. These effects include both technical effects, such as variation across batches from instrument noise or sample processing, or institution-specific differences in sample acquisition and physical handling, as well as biological effects arising from true but irrelevant differences in the biology of each sample, such as age biases in diseases. Prior work has used linear methods to adjust for such batch effects. Here, we apply contrastive metric learning by a non-linear triplet network to optimize the ability to distinguish biologically distinct sample classes in the presence of irrelevant technical and biological variation. Using whole-genome cell-free DNA data from 817 patients, we demonstrate that our approach, METric learning for Confounder Control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Single-cell and spatial transcriptomics · Bioinformatics and Genomic Networks
