MeetDot: Videoconferencing with Live Translation Captions

Arkady Arkhangorodsky; Christopher Chu; Scot Fang; Yiqi Huang; Denglin; Jiang; Ajay Nagesh; Boliang Zhang; Kevin Knight

arXiv:2109.09577·cs.CL·September 21, 2021

MeetDot: Videoconferencing with Live Translation Captions

Arkady Arkhangorodsky, Christopher Chu, Scot Fang, Yiqi Huang, Denglin, Jiang, Ajay Nagesh, Boliang Zhang, Kevin Knight

PDF

Open Access

TL;DR

MeetDot is a videoconferencing system that provides live translation captions in multiple languages, aiming to improve multilingual communication by integrating ASR and MT with user-friendly features and evaluation tools.

Contribution

The paper introduces MeetDot, a modular, open-source videoconferencing system with real-time translation captions, optimized for low latency and user experience, and includes novel evaluation metrics.

Findings

01

Supports 4 languages with integrated ASR and MT

02

Features smooth scrolling and flicker reduction for better user experience

03

Includes an innovative cross-lingual word-guessing game for system evaluation

Abstract

We present MeetDot, a videoconferencing system with live translation captions overlaid on screen. The system aims to facilitate conversation between people who speak different languages, thereby reducing communication barriers between multilingual participants. Currently, our system supports speech and captions in 4 languages and combines automatic speech recognition (ASR) and machine translation (MT) in a cascade. We use the re-translation strategy to translate the streamed speech, resulting in caption flicker. Additionally, our system has very strict latency requirements to have acceptable call quality. We implement several features to enhance user experience and reduce their cognitive load, such as smooth scrolling captions and reducing caption flicker. The modular architecture allows us to integrate different ASR and MT services in our backend. Our system provides an integrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Video Analysis and Summarization · Multimodal Machine Learning Applications