Cross-Language Speaker Attribute Prediction Using MIL and RL
Sunny Shu, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag

TL;DR
This paper introduces RLMIL-DAT, a novel multilingual framework combining reinforcement learning and domain adversarial training to improve cross-language speaker attribute prediction, especially in low-resource and zero-shot scenarios.
Contribution
The paper presents RLMIL-DAT, a new approach that enhances multilingual speaker attribute prediction by integrating reinforced instance selection with domain adversarial training.
Findings
RLMIL-DAT consistently outperforms standard MIL and RL-MIL in macro F1 scores.
Domain adversarial training significantly contributes to performance improvements.
The approach effectively transfers knowledge from high-resource to low-resource languages.
Abstract
We study multilingual speaker attribute prediction under linguistic variation, domain mismatch, and data imbalance across languages. We propose RLMIL-DAT, a multilingual extension of the reinforced multiple instance learning framework that combines reinforcement learning based instance selection with domain adversarial training to encourage language invariant utterance representations. We evaluate the approach on a five language Twitter corpus in a few shot setting and on a VoxCeleb2 derived corpus covering forty languages in a zero shot setting for gender and age prediction. Across a wide range of model configurations and multiple random seeds, RLMIL-DAT consistently improves Macro F1 compared to standard multiple instance learning and the original reinforced multiple instance learning framework. The largest gains are observed for gender prediction, while age prediction remains more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Authorship Attribution and Profiling
