Elucidate Gender Fairness in Singing Voice Transcription
Xiangming Gu, Wei Zeng, Ye Wang

TL;DR
This paper investigates gender bias in singing voice transcription systems, revealing female superiority due to pitch distribution differences, and proposes a method to reduce bias while maintaining performance.
Contribution
It identifies the cause of gender bias in SVT as pitch distribution differences and introduces an adversarial training approach with conditional alignment to mitigate bias.
Findings
Gender bias in SVT systems is significant and consistent.
Pitch distribution differences, not data imbalance, cause gender disparity.
Proposed method reduces gender bias by over 50% with minimal performance loss.
Abstract
It is widely known that males and females typically possess different sound characteristics when singing, such as timbre and pitch, but it has never been explored whether these gender-based characteristics lead to a performance disparity in singing voice transcription (SVT), whose target includes pitch. Such a disparity could cause fairness issues and severely affect the user experience of downstream SVT applications. Motivated by this, we first demonstrate the female superiority of SVT systems, which is observed across different models and datasets. We find that different pitch distributions, rather than gender data imbalance, contribute to this disparity. To address this issue, we propose using an attribute predictor to predict gender labels and adversarially training the SVT system to enforce the gender-invariance of acoustic representations. Leveraging the prior knowledge that pitch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
