Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

Nimrod Berman; Omkar Joglekar; Eitan Kosman; Dotan Di Castro; Omri Azencot

arXiv:2510.20819·cs.CV·October 28, 2025

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge

Nimrod Berman, Omkar Joglekar, Eitan Kosman, Dotan Di Castro, Omri Azencot

PDF

1 Video

TL;DR

This paper introduces LDDBM, a versatile latent diffusion framework for modality translation that operates across diverse sensory domains without restrictive assumptions, improving generality and performance.

Contribution

The work presents a novel latent-variable diffusion model with contrastive and predictive losses, enabling flexible, domain-agnostic modality translation across multiple tasks.

Findings

01

Supports arbitrary modality pairs

02

Achieves strong results on diverse tasks

03

Establishes new baseline in modality translation

Abstract

Recent advances in generative modeling have positioned diffusion models as state-of-the-art tools for sampling from complex data distributions. While these models have shown remarkable success across single-modality domains such as images and audio, extending their capabilities to Modality Translation (MT), translating information across different sensory modalities, remains an open challenge. Existing approaches often rely on restrictive assumptions, including shared dimensionality, Gaussian source priors, and modality-specific architectures, which limit their generality and theoretical grounding. In this work, we propose the Latent Denoising Diffusion Bridge Model (LDDBM), a general-purpose framework for modality translation based on a latent-variable extension of Denoising Diffusion Bridge Models. By operating in a shared latent space, our method learns a bridge between arbitrary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge· slideslive