MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs

Zien Sheikh Ali; Hunzalah Hassan Bhatti; Rabindra Nath Nandi; Shammur Absar Chowdhury; Firoj Alam

arXiv:2602.07036·cs.SD·February 10, 2026

MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs

Zien Sheikh Ali, Hunzalah Hassan Bhatti, Rabindra Nath Nandi, Shammur Absar Chowdhury, Firoj Alam

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MENASpeechBank, a diverse speech dataset from MENA speakers, and a synthetic data pipeline for creating persona-grounded multi-turn conversations to enhance AudioLLMs.

Contribution

It provides a new high-quality speech dataset and a controllable synthetic data generation pipeline for persona-based conversational audio modeling.

Findings

01

MENASpeechBank includes 18K utterances from 124 speakers across MENA.

02

Generated 417K role-play conversations for training and evaluation.

03

Synthetic data improves AudioLLMs' ability to handle persona and dialectal diversity.

Abstract

Audio large language models (AudioLLMs) enable instruction-following over speech and general audio, but progress is increasingly limited by the lack of diverse, conversational, instruction-aligned speech-text data. This bottleneck is especially acute for persona-grounded interactions and dialectal coverage, where collecting and releasing real multi-speaker recordings is costly and slow. We introduce MENASpeechBank, a reference speech bank comprising about 18K high-quality utterances from 124 speakers spanning multiple MENA countries, covering English, Modern Standard Arabic (MSA), and regional Arabic varieties. Building on this resource, we develop a controllable synthetic data pipeline that: (i) constructs persona profiles enriched with World Values Survey-inspired attributes, (ii) defines a taxonomy of about 5K conversational scenarios, (iii) matches personas to scenarios via semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

QCRI/MenaSpeechBank
dataset· 3.8k dl
3.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · AI in Service Interactions