How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation
Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli

TL;DR
This paper proposes a multi-gender speech translation model that incorporates speaker gender metadata, outperforming gender-specific models in accuracy and offering a more maintainable solution for gender-aware translation.
Contribution
It introduces a single multi-gender speech translation model that effectively integrates speaker gender metadata, eliminating the need for separate models.
Findings
Multi-gender model outperforms gender-specific models in accuracy.
Incorporating speaker metadata improves gender assignment accuracy.
Fine-tuning from existing models is less effective than training from scratch.
Abstract
When translating from notional gender languages (e.g., English) into grammatical gender languages (e.g., Italian), the generated translation requires explicit gender assignments for various words, including those referring to the speaker. When the source sentence does not convey the speaker's gender, speech translation (ST) models either rely on the possibly-misleading vocal traits of the speaker or default to the masculine gender, the most frequent in existing training corpora. To avoid such biased and not inclusive behaviors, the gender assignment of speaker-related expressions should be guided by externally-provided metadata about the speaker's gender. While previous work has shown that the most effective solution is represented by separate, dedicated gender-specific models, the goal of this paper is to achieve the same results by integrating the speaker's gender metadata into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
