Scalable Batch Correction for Cell Painting via Batch-Dependent Kernels and Adaptive Sampling
Aditya Narayan Ravi, Snehal Vadvalkar, Abhishek Pandey, Ilan Shomorony

TL;DR
BALANS is a scalable, efficient batch correction method for Cell Painting data that constructs a batch-aware affinity matrix using adaptive sampling, improving correction quality and runtime on large datasets.
Contribution
We introduce BALANS, a novel scalable batch correction technique using batch-dependent kernels and adaptive sampling, with theoretical guarantees and practical efficiency.
Findings
Balances correction quality with computational efficiency.
Runs in nearly linear time for large datasets.
Improves over existing batch correction methods in real-world experiments.
Abstract
Cell Painting is a microscopy-based, high-content imaging assay that produces rich morphological profiles of cells and can support drug discovery by quantifying cellular responses to chemical perturbations. At scale, however, Cell Painting data is strongly affected by batch effects arising from differences in laboratories, instruments, and protocols, which can obscure biological signal. We present BALANS (Batch Alignment via Local Affinities and Subsampling), a scalable batch-correction method that aligns samples across batches by constructing a smoothed affinity matrix from pairwise distances. Given data points, BALANS builds a sparse affinity matrix using two ideas. (i) For points and , it sets a local scale using the distance from to its -th nearest neighbor within the batch of , then computes via a Gaussian kernel…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper is well organized and flows naturally; the need for addressing batch effects is clearly motivated. 2. Combining batch-aware local affinities with adaptive sampling and low rank approximation is a scalable and well-reasoned solution. The authors provide proofs for coverage guarantees and approximation error bounds of the sparse affinity matrix. 3. Evaluations span multiple real-world Cell Painting datasets and synthetic scalability tests; BALANS achieves consistently strong perfor
1. Core ideas like adaptive kernels and landmark-based sampling are not entirely new for biological data or affinity matrix computation. Prior work using adaptive bandwidths and landmark-based scalable affinity construction, such as PHATE (by Kevin Moon et al.), is not cited. 2. Figure 4 is presented but lacks sufficient interpretation or biological insight; more discussion of qualitative improvements would strengthen the narrative. 3. While quantitative metrics are discussed, more analysis on w
- The problem and the rationale behind the method are well illustrated. - Compared to the baseline methods, BALANS demonstrates a fast run-time, which is important for applications to high-throughput cell painting assays. - The paper shows theoretical and empirical analysis of the algorithm.
- To me, BALANS seems very similar to BBKNN [1], which is not cited, compared to or discussed in the paper. BBKNN constructs a graph by independently identifying k-nearest neighbors for each cell within each batch, and then merges these neighbor sets. This seems similar to the batch-dependent local scale. Furthermore, BBKNN utilizes annoy instead of the lower-rank approximation to compute the affinity matrix efficiently, and the paper claims that it runs in linear time complexity. - BALANS requi
* The main idea is relatively simple, and involves correcting for the batch when estimating the affinity matrix, which is then used for the Nystrom method. * The remaining contributions are to propose a computationally efficient way to estimate a submatrix with desirable properties, such as having good coverage of the biological groups, and having almost-linear runtime. * The sampling algorithm introduces very few hyperparameters, which facilitates model selection. * The experimental results are
The theoretical results involve the Moore-Penrose pseudoinverse, but the implementation excludes it for computational reasons.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Single-cell and spatial transcriptomics · Digital Imaging for Blood Diseases
