JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Anthony Chen; Naomi Ken Korem; Gal Zeevi; Tavi Halperin; Matan Ben Yosef; Urska Jelercic; Ofir Bibi; Or Patashnik; Daniel Cohen-Or

arXiv:2601.22143·cs.GR·May 12, 2026

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Anthony Chen, Naomi Ken Korem, Gal Zeevi, Tavi Halperin, Matan Ben Yosef, Urska Jelercic, Ofir Bibi, Or Patashnik, Daniel Cohen-Or

PDF

1 Repo

TL;DR

This paper introduces a novel single-model video dubbing method using a joint audio-visual diffusion model and lightweight LoRA conditioning, enabling high-quality, synchronized dubbing with preserved speaker identity.

Contribution

The work presents a new approach that adapts a foundational audio-visual diffusion model for video dubbing using LoRA, simplifying the pipeline and improving robustness.

Findings

01

Produces high-quality dubbed videos with better lip sync.

02

Preserves speaker identity and visual fidelity.

03

Outperforms existing dubbing pipelines in robustness.

Abstract

Audio-Visual Foundation Models, which are pretrained to jointly generate sound and visual content, have recently shown an unprecedented ability to model multi-modal generation and editing, opening new opportunities for downstream tasks. Among these tasks, video dubbing could greatly benefit from such priors, yet most existing solutions still rely on complex, task-specific pipelines that struggle in real-world settings. In this work, we introduce a single-model approach that adapts a foundational audio-video diffusion model for video-to-video dubbing via a lightweight LoRA. The LoRA enables the model to condition on an input audio-video while jointly generating translated audio and synchronized facial motion. To train this LoRA, we leverage the generative model itself to synthesize paired multilingual videos of the same speaker. Specifically, we generate multilingual videos with language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

justdubit/just-dub-it
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.