DepMicroDiff: Diffusion-Based Dependency-Aware Multimodal Imputation for Microbiome Data
Rabeya Tus Sadia, Qiang Cheng

TL;DR
DepMicroDiff introduces a diffusion-based, dependency-aware framework combining transformers and LLM-encoded metadata to improve microbiome data imputation, outperforming existing methods across multiple cancer datasets.
Contribution
It presents a novel diffusion-based model with dependency-aware transformers and LLM-encoded metadata, enhancing microbiome imputation accuracy and robustness.
Findings
Achieves higher Pearson correlation (up to 0.712)
Attains cosine similarity up to 0.812
Demonstrates robustness across diverse cancer datasets
Abstract
Microbiome data analysis is essential for understanding host health and disease, yet its inherent sparsity and noise pose major challenges for accurate imputation, hindering downstream tasks such as biomarker discovery. Existing imputation methods, including recent diffusion-based models, often fail to capture the complex interdependencies between microbial taxa and overlook contextual metadata that can inform imputation. We introduce DepMicroDiff, a novel framework that combines diffusion-based generative modeling with a Dependency-Aware Transformer (DAT) to explicitly capture both mutual pairwise dependencies and autoregressive relationships. DepMicroDiff is further enhanced by VAE-based pretraining across diverse cancer datasets and conditioning on patient metadata encoded via a large language model (LLM). Experiments on TCGA microbiome datasets show that DepMicroDiff substantially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGut microbiota and health · Gene expression and cancer classification
