TRAVID: An End-to-End Video Translation Framework

Prottay Kumar Adhikary; Bandaru Sugandhi; Subhojit Ghimire; Santanu; Pal; Partha Pakray

arXiv:2309.11338·cs.CL·September 21, 2023

TRAVID: An End-to-End Video Translation Framework

Prottay Kumar Adhikary, Bandaru Sugandhi, Subhojit Ghimire, Santanu, Pal, Partha Pakray

PDF

Open Access

TL;DR

TRAVID is an end-to-end video translation system that translates speech, synchronizes lip movements, and uses voice cloning to improve immersive learning experiences in multilingual educational contexts.

Contribution

It introduces a novel system that combines speech translation, lip synchronization, and voice cloning for effective multilingual video education.

Findings

01

Effective translation with lip-sync in low-resource settings

02

Enhanced learning experience through synchronized lip movements

03

Successful application in Indian language educational videos

Abstract

In today's globalized world, effective communication with people from diverse linguistic backgrounds has become increasingly crucial. While traditional methods of language translation, such as written text or voice-only translations, can accomplish the task, they often fail to capture the complete context and nuanced information conveyed through nonverbal cues like facial expressions and lip movements. In this paper, we present an end-to-end video translation system that not only translates spoken language but also synchronizes the translated speech with the lip movements of the speaker. Our system focuses on translating educational lectures in various Indian languages, and it is designed to be effective even in low-resource system settings. By incorporating lip movements that align with the target language and matching them with the speaker's voice using voice cloning techniques, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Speech and Audio Processing · Multimodal Machine Learning Applications

Methodsfail · ALIGN