scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
Chenglei Yu, Chuanrui Wang, Bangyan Liao, Tailin Wu

TL;DR
scDFM introduces a distributional flow matching model with a novel architecture to predict cellular responses to perturbations, effectively handling noise and population shifts in single-cell data.
Contribution
The paper presents a new generative framework, scDFM, that models full cell population distributions and incorporates a perturbation-aware transformer for improved robustness and accuracy.
Findings
Outperforms prior methods on multiple benchmarks.
Reduces mean squared error by 19.6% in combinatorial settings.
Demonstrates strong generalization to unseen perturbations.
Abstract
A central goal in systems biology and drug discovery is to predict the transcriptional response of cells to perturbations. This task is challenging due to the noisy and sparse nature of single-cell measurements, as well as the fact that perturbations often induce population-level shifts rather than changes in individual cells. Existing deep learning methods typically assume cell-level correspondences, limiting their ability to capture such global effects. We present scDFM, a generative framework based on conditional flow matching that models the full distribution of perturbed cells conditioned on control states. By incorporating a maximum mean discrepancy (MMD) objective, our method aligns perturbed and control populations beyond cell-level correspondences. To further improve robustness to sparsity and noise, we introduce the Perturbation-Aware Differential Transformer…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is well written and easy to follow. The idea is original.
The model does not learn the gene-gene interaction network but it is given it as a biologically grounded prior. This graph is constructed from simple absolute Pearson correlation on the training data, which prevents the model from discovering novel, non-obvious, or non-linear gene relationships that aren't captured by basic correlation. The flow matching framework learns a path from control to perturbed. As the paper acknoledges, it uses a simple linear interpolant as the reference path, which
- The paper is well written and easy to understand. - This paper addresses two interesting and important questions in a single framework: genetic and molecular perturbation. - The framework uses a biological prior which strengthens the model. - The authors use flow matching instead of diffusion models and autoencoder (like prior methods) which is an interesting architectural choice and the reason behind it is sound .
The main weakness regarding this paper is Section 4. Experiments: - The experimental results are not strong. In most cases, scDFM just barely outperforms the baselines. - The baselines used in this paper are not the latest and best in the field. - The dataset is limited and since the model is not showing strong results, it is not clear how scDFM would perform on other datasets. The framework for molecular perturbation has some limitations: - It cannot be generalized to unseen molecules. - The
1. The paper is well-written and clear, with a strong visual presentation and reproducibility statement. 2. Applying flow matching directly in the expression space is a reasonable and technically clean adaptation of continuous generative models to the single-cell domain. 3. The idea of incorporating a population-level regularizer (MMD) is conceptually sound and aligns with the motivation to capture distributional, rather than per-cell, perturbation effects. 4. The inclusion of differential atten
1. Limited novelty and incremental contribution Flow matching for biological state modeling has already appeared in multiple works. The MMD term is a straightforward sample-based regularizer with no new theoretical or algorithmic insight. The PAD-Transformer largely reuses existing building blocks (Differential Transformer + GEARS-style gene-masking). As a result, the paper reads as a combination of known components rather than a fundamentally new modeling principle. 2. Outdated benchmarking a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Gene Regulatory Network Analysis
