Cross-Language Speaker Attribute Prediction Using MIL and RL

Sunny Shu; Seyed Sahand Mohammadi Ziabari; Ali Mohammed Mansoor Alsahag

arXiv:2601.04257·cs.AI·January 9, 2026

Cross-Language Speaker Attribute Prediction Using MIL and RL

Sunny Shu, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag

PDF

Open Access

TL;DR

This paper introduces RLMIL-DAT, a novel multilingual framework combining reinforcement learning and domain adversarial training to improve cross-language speaker attribute prediction, especially in low-resource and zero-shot scenarios.

Contribution

The paper presents RLMIL-DAT, a new approach that enhances multilingual speaker attribute prediction by integrating reinforced instance selection with domain adversarial training.

Findings

01

RLMIL-DAT consistently outperforms standard MIL and RL-MIL in macro F1 scores.

02

Domain adversarial training significantly contributes to performance improvements.

03

The approach effectively transfers knowledge from high-resource to low-resource languages.

Abstract

We study multilingual speaker attribute prediction under linguistic variation, domain mismatch, and data imbalance across languages. We propose RLMIL-DAT, a multilingual extension of the reinforced multiple instance learning framework that combines reinforcement learning based instance selection with domain adversarial training to encourage language invariant utterance representations. We evaluate the approach on a five language Twitter corpus in a few shot setting and on a VoxCeleb2 derived corpus covering forty languages in a zero shot setting for gender and age prediction. Across a wide range of model configurations and multiple random seeds, RLMIL-DAT consistently improves Macro F1 compared to standard multiple instance learning and the original reinforced multiple instance learning framework. The largest gains are observed for gender prediction, while age prediction remains more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Authorship Attribution and Profiling