Controllable diffusion-based generation for multi-channel biological data

Haoran Zhang; Mingyuan Zhou; Wesley Tansey

arXiv:2507.02902·cs.LG·July 8, 2025

Controllable diffusion-based generation for multi-channel biological data

Haoran Zhang, Mingyuan Zhou, Wesley Tansey

PDF

3 Reviews

TL;DR

This paper introduces a diffusion-based generative model tailored for multi-channel biological data, capable of flexible, controllable generation and reconstruction of missing data across spatially aligned channels.

Contribution

It presents a hierarchical feature injection and attention mechanism for structured biological data, enabling generalization to arbitrary observed and missing channel combinations.

Findings

01

Achieves state-of-the-art results in protein imputation and gene-to-protein prediction.

02

Demonstrates strong generalization to unseen channel configurations.

03

Supports flexible, multi-resolution conditioning on spatially aligned data.

Abstract

Spatial profiling technologies in biology, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate high-dimensional, multi-channel data with strong spatial alignment and complex inter-channel relationships. Generative modeling of such data requires jointly capturing intra- and inter-channel structure, while also generalizing across arbitrary combinations of observed and missing channels for practical application. Existing diffusion-based models generally assume low-dimensional inputs (e.g., RGB images) and rely on simple conditioning mechanisms that break spatial correspondence and ignore inter-channel dependencies. This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data. Our model contains two key innovations: (1) a hierarchical feature injection mechanism that enables multi-resolution conditioning…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1) This paper combines two different methodological innovations in a quite ingenious way, effectively addressing the spatial and inter-channel complexity of biological data. 2) The resulting models demonstrates versatility across multiple domains, including spatial proteomics, single-cell omics, and MRI modality synthesis, showing strong generalization and scalability. 3) Finally, the presented evaluation is quite comprehensive. I particularly appreciate the ablation studies that assess the in

Weaknesses

1) All comparisons reported in Tables 1 to 3 lack any assessement of stastistical significance. This makes it difficult to gauge whether differences in performances are actually significant. 2) There is not biologically-grounded evaluation of the imputed data. For example, are known protein markers expressed in their corresponding cells?

Reviewer 02Rating 0Confidence 4

Strengths

1. The idea of developing a framework capable of controllably generating multi-channel biological data using diffusion models is interesting.

Weaknesses

1. The paper is quite obscure and its objective remains unclear. The title suggests that it focuses on developing a generative framework for multi-channel biological data, but the type of data is not specified. I assumed the authors were referring to images, yet in the experiments they attempt to predict protein expression from paired scRNA-seq data, and later they evaluate their method on MRI images. This inconsistency makes the overall methodology difficult to understand and significantly unde

Reviewer 03Rating 2Confidence 3

Strengths

- **Problem relevance.** Training with random channel masking yields one model that accepts arbitrary observed subsets and making it flexible. The union and intersection result supports cross-dataset integration under partial channel overlap. - **Strong empirical results.** When reported, the method consistently outperforms baselines. Experiments are broad and span single/multi dataset setups and including hybrid controls. - **Ablations.** Stepwise ablations and ControlNet/BrushNet hybrids hel

Weaknesses

- **Subset-size stress-tests are missing.** One of the core claims is robustness to arbitrary observed subsets, but there is no sweep of performance vs. #observed channels / masking-probability p, nor targeted leave a group out per channel families. Single vs multi channel and union vs intersection is positive but partial. - **Efficiency evidence.** Table 1 lists SiD(1-step) with near identical accuracy and claims two orders of magnitude speedup, but there are no wallclock analysis for readers

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.