SwissADT: An Audio Description Translation System for Swiss Languages
Lukas Fischer, Yingqiang Gao, Alexa Lintner, Sarah Ebling

TL;DR
SwissADT is a novel multilingual audio description translation system for Swiss languages that leverages well-crafted AD data, video context, and Large Language Models to improve accessibility for visually impaired populations.
Contribution
This work introduces SwissADT, the first ADT system for Swiss languages, utilizing video context and LLMs to enhance translation quality in a multilingual setting.
Findings
SwissADT achieves promising translation quality based on automatic and human evaluations.
Incorporating video information can potentially improve ADT outputs.
Large Language Models effectively support multilingual ADT tasks.
Abstract
Audio description (AD) is a crucial accessibility service provided to blind persons and persons with visual impairment, designed to convey visual information in acoustic form. Despite recent advancements in multilingual machine translation research, the lack of well-crafted and time-synchronized AD data impedes the development of audio description translation (ADT) systems that address the needs of multilingual countries such as Switzerland. Furthermore, since the majority of ADT systems rely solely on text, uncertainty exists as to whether incorporating visual information from the corresponding video clips can enhance the quality of ADT outputs. In this work, we present SwissADT, the first ADT system implemented for three main Swiss languages and English. By collecting well-crafted AD data augmented with video clips in German, French, Italian, and English, and leveraging the power of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Subtitles and Audiovisual Media · Speech Recognition and Synthesis
Methodstravel james
