Video Editing for Audio-Visual Dubbing

Binyamin Manela; Sharon Gannot; Ethan Fetyaya

arXiv:2505.23406·cs.CV·May 30, 2025

Video Editing for Audio-Visual Dubbing

Binyamin Manela, Sharon Gannot, Ethan Fetyaya

PDF

Open Access 1 Repo

TL;DR

EdiDub is a novel content-aware video editing framework that improves audio-visual dubbing by accurately synchronizing facial movements while preserving original scene details, outperforming existing methods.

Contribution

Introduces EdiDub, a content-aware editing approach for visual dubbing that maintains scene integrity and enhances synchronization accuracy.

Findings

01

Significantly improves identity preservation and lip synchronization.

02

Outperforms existing methods in benchmarks, especially with occlusions.

03

Human evaluations favor EdiDub's naturalness and synchronization.

Abstract

Visual dubbing, the synchronization of facial movements with new speech, is crucial for making content accessible across different languages, enabling broader global reach. However, current methods face significant limitations. Existing approaches often generate talking faces, hindering seamless integration into original scenes, or employ inpainting techniques that discard vital visual information like partial occlusions and lighting variations. This work introduces EdiDub, a novel framework that reformulates visual dubbing as a content-aware editing task. EdiDub preserves the original video context by utilizing a specialized conditioning scheme to ensure faithful and accurate modifications rather than mere copying. On multiple benchmarks, including a challenging occluded-lip dataset, EdiDub significantly improves identity preservation and synchronization. Human evaluations further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edidub/edidub-results
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis

MethodsInpainting