Speech After Gender: A Trans-Feminine Perspective on Next Steps for   Speech Science and Technology

Robin Netzorg; Alyssa Cote; Sumi Koshin; Klo Vivienne Garoute; Gopala; Krishna Anumanchipalli

arXiv:2407.07235·cs.SD·July 11, 2024

Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology

Robin Netzorg, Alyssa Cote, Sumi Koshin, Klo Vivienne Garoute, Gopala, Krishna Anumanchipalli

PDF

Open Access

TL;DR

This paper introduces the Versatile Voice Dataset to highlight limitations of current gender-based speaker models and advocates for modeling individual vocal qualities to better capture voice identity.

Contribution

It presents the VVD dataset and demonstrates the inadequacy of existing gender and speaker verification systems in capturing voice flexibility, proposing a focus on vocal texture qualities.

Findings

01

Current models fail with voice modifications

02

Gender classification is highly sensitive to voice changes

03

Speaker verification struggles with drastic voice modifications

Abstract

As experts in voice modification, trans-feminine gender-affirming voice teachers have unique perspectives on voice that confound current understandings of speaker identity. To demonstrate this, we present the Versatile Voice Dataset (VVD), a collection of three speakers modifying their voices along gendered axes. The VVD illustrates that current approaches in speaker modeling, based on categorical notions of gender and a static understanding of vocal texture, fail to account for the flexibility of the vocal tract. Utilizing publicly-available speaker embeddings, we demonstrate that gender classification systems are highly sensitive to voice modification, and speaker verification systems fail to identify voices as coming from the same speaker as voice modification becomes more drastic. As one path towards moving beyond categorical and static notions of speaker identity, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Communication and Language