Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
Jarod Duret (LIA), Yannick Est\`eve (LIA), Titouan Parcollet (CAM)

TL;DR
This paper investigates how the selection of discrete speech units affects the performance of textless speech-to-speech translation systems, revealing that optimal units for resynthesis differ from those for translation quality.
Contribution
It provides a detailed analysis of target speech unit selection criteria and highlights the discrepancy between units optimized for resynthesis versus translation performance.
Findings
Units good for speech resynthesis do not always improve translation.
Discrepancy in optimization criteria impacts translation system performance.
Study covers tasks like speech recognition, synthesis, speaker, and emotion recognition.
Abstract
Recent advancements in textless speech-to-speech translation systems have been driven by the adoption of self-supervised learning techniques. Although most state-of-the-art systems adopt a similar architecture to transform source language speech into sequences of discrete representations in the target language, the criteria for selecting these target speech units remains an open question. This work explores the selection process through a study of downstream tasks such as automatic speech recognition, speech synthesis, speaker recognition, and emotion recognition. Interestingly, our findings reveal a discrepancy in the optimization of discrete speech units: units that perform well in resynthesis performance do not necessarily correlate with those that enhance translation efficacy. This discrepancy underscores the nuanced complexity of target feature selection and its impact on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsFeature Selection
