Soft Language Identification for Language-Agnostic Many-to-One   End-to-End Speech Translation

Peidong Wang; Jian Xue; Jinyu Li; Junkun Chen; Aswin Shanmugam; Subramanian

arXiv:2406.10276·cs.CL·June 18, 2024

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam, Subramanian

PDF

Open Access

TL;DR

This paper introduces a linear input network to incorporate language information into language-agnostic speech translation models, enhancing specified language performance without sacrificing overall translation quality.

Contribution

It proposes a simple linear input network that preserves model performance while effectively integrating language-specific information.

Findings

01

Enhanced language-specific translation accuracy

02

Maintained overall translation quality

03

Effective integration of language info with minimal model changes

Abstract

Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the other languages. We accomplish this by introducing a simple and effective linear input network. The linear input network is initialized as an identity matrix, which ensures that the model can perform as well as, or better than, the original model. Experimental results show that the proposed method can successfully enhance the specified language, while keeping the language-agnostic ability of the many-to-one ST models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Interpreting and Communication in Healthcare