TL;DR
This paper introduces a diffusion-based generative model tailored for multi-channel biological data, capable of flexible, controllable generation and reconstruction of missing data across spatially aligned channels.
Contribution
It presents a hierarchical feature injection and attention mechanism for structured biological data, enabling generalization to arbitrary observed and missing channel combinations.
Findings
Achieves state-of-the-art results in protein imputation and gene-to-protein prediction.
Demonstrates strong generalization to unseen channel configurations.
Supports flexible, multi-resolution conditioning on spatially aligned data.
Abstract
Spatial profiling technologies in biology, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate high-dimensional, multi-channel data with strong spatial alignment and complex inter-channel relationships. Generative modeling of such data requires jointly capturing intra- and inter-channel structure, while also generalizing across arbitrary combinations of observed and missing channels for practical application. Existing diffusion-based models generally assume low-dimensional inputs (e.g., RGB images) and rely on simple conditioning mechanisms that break spatial correspondence and ignore inter-channel dependencies. This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data. Our model contains two key innovations: (1) a hierarchical feature injection mechanism that enables multi-resolution conditioning…
Peer Reviews
Decision·ICLR 2026 Poster
1) This paper combines two different methodological innovations in a quite ingenious way, effectively addressing the spatial and inter-channel complexity of biological data. 2) The resulting models demonstrates versatility across multiple domains, including spatial proteomics, single-cell omics, and MRI modality synthesis, showing strong generalization and scalability. 3) Finally, the presented evaluation is quite comprehensive. I particularly appreciate the ablation studies that assess the in
1) All comparisons reported in Tables 1 to 3 lack any assessement of stastistical significance. This makes it difficult to gauge whether differences in performances are actually significant. 2) There is not biologically-grounded evaluation of the imputed data. For example, are known protein markers expressed in their corresponding cells?
1. The idea of developing a framework capable of controllably generating multi-channel biological data using diffusion models is interesting.
1. The paper is quite obscure and its objective remains unclear. The title suggests that it focuses on developing a generative framework for multi-channel biological data, but the type of data is not specified. I assumed the authors were referring to images, yet in the experiments they attempt to predict protein expression from paired scRNA-seq data, and later they evaluate their method on MRI images. This inconsistency makes the overall methodology difficult to understand and significantly unde
- **Problem relevance.** Training with random channel masking yields one model that accepts arbitrary observed subsets and making it flexible. The union and intersection result supports cross-dataset integration under partial channel overlap. - **Strong empirical results.** When reported, the method consistently outperforms baselines. Experiments are broad and span single/multi dataset setups and including hybrid controls. - **Ablations.** Stepwise ablations and ControlNet/BrushNet hybrids hel
- **Subset-size stress-tests are missing.** One of the core claims is robustness to arbitrary observed subsets, but there is no sweep of performance vs. #observed channels / masking-probability p, nor targeted leave a group out per channel families. Single vs multi channel and union vs intersection is positive but partial. - **Efficiency evidence.** Table 1 lists SiD(1-step) with near identical accuracy and claims two orders of magnitude speedup, but there are no wallclock analysis for readers
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
