Breeding Gender-aware Direct Speech Translation Systems
Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco, Turchi

TL;DR
This paper explores methods to incorporate speaker gender information into direct speech translation models to improve gender translation accuracy, demonstrating significant performance gains over gender-unaware models.
Contribution
It introduces and compares approaches for integrating speaker gender cues into direct speech translation systems, addressing gender bias and enhancing translation accuracy.
Findings
Gender-aware models improve gender translation accuracy by up to 30 points.
Manual annotation of datasets with gender information enables effective training.
Gender-aware solutions outperform gender-unaware models in real-world scenarios.
Abstract
In automatic speech translation (ST), traditional cascade approaches involving separate transcription and translation steps are giving ground to increasingly competitive and more robust direct solutions. In particular, by translating speech audio data without intermediate transcription, direct ST models are able to leverage and preserve essential information present in the input (e.g. speaker's vocal characteristics) that is otherwise lost in the cascade framework. Although such ability proved to be useful for gender translation, direct ST is nonetheless affected by gender bias just like its cascade counterpart, as well as machine translation and numerous other natural language processing applications. Moreover, direct ST systems that exclusively rely on vocal biometric features as a gender cue can be unsuitable and potentially harmful for certain users. Going beyond speech signals, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
