TRAVID: An End-to-End Video Translation Framework
Prottay Kumar Adhikary, Bandaru Sugandhi, Subhojit Ghimire, Santanu, Pal, Partha Pakray

TL;DR
TRAVID is an end-to-end video translation system that translates speech, synchronizes lip movements, and uses voice cloning to improve immersive learning experiences in multilingual educational contexts.
Contribution
It introduces a novel system that combines speech translation, lip synchronization, and voice cloning for effective multilingual video education.
Findings
Effective translation with lip-sync in low-resource settings
Enhanced learning experience through synchronized lip movements
Successful application in Indian language educational videos
Abstract
In today's globalized world, effective communication with people from diverse linguistic backgrounds has become increasingly crucial. While traditional methods of language translation, such as written text or voice-only translations, can accomplish the task, they often fail to capture the complete context and nuanced information conveyed through nonverbal cues like facial expressions and lip movements. In this paper, we present an end-to-end video translation system that not only translates spoken language but also synchronizes the translated speech with the lip movements of the speaker. Our system focuses on translating educational lectures in various Indian languages, and it is designed to be effective even in low-resource system settings. By incorporating lip movements that align with the target language and matching them with the speaker's voice using voice cloning techniques, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Speech and Audio Processing · Multimodal Machine Learning Applications
Methodsfail · ALIGN
