Detecting Batch Heterogeneity via Likelihood Clustering
Austin Talbot, Yue Ke

TL;DR
This paper presents a likelihood-based clustering method to detect batch heterogeneity in genomic data, improving reliability in clinical CNV detection by distinguishing technical artifacts from biological signals.
Contribution
The authors introduce a novel likelihood evidence clustering approach that detects batch effects without prior batch labels, outperforming existing methods in accuracy and clinical applicability.
Findings
Accurately detects batch effects across multiple sequencing panels and modalities.
Outperforms standard correlation and dimensionality reduction methods.
Maintains conservative false positive rates suitable for clinical use.
Abstract
Batch effects represent a major confounder in genomic diagnostics. In copy number variant (CNV) detection from NGS, many algorithms compare read depth between test samples and a reference sample, assuming they are process-matched. When this assumption is violated, with causes ranging from reagent lot changes to multi-site processing, the reference becomes inappropriate, introducing false CNV calls or masking true pathogenic variants. Detecting such heterogeneity before downstream analysis is critical for reliable clinical interpretation. Existing batch effect detection methods either cluster samples based on raw features, risking conflation of biological signal with technical variation, or require known batch labels that are frequently unavailable. We introduce a method that addresses both limitations by clustering samples according to their Bayesian model evidence. The central insight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomic variations and chromosomal abnormalities · Cancer Genomics and Diagnostics · Genomics and Rare Diseases
