WASIL: In-the-Wild Arabic Spoken Interactions with LLMs
Zien Sheikh Ali, Hamdy Mubarak, Soon-Gyo Jung, Hunzalah Hassan Bhatti, Firoj Alam, Shammur Absar Chowdhury

TL;DR
This paper introduces WASIL, a comprehensive dataset of in-the-wild Arabic spoken interactions with LLMs, including audio, ASR hypotheses, responses, and feedback, to evaluate and improve voice assistant performance across dialects.
Contribution
It provides a large, annotated Arabic spoken interaction dataset with multi-dialect coverage, gold transcripts, answerability labels, and a scalable LLM-based response evaluation method.
Findings
14.2% dislike rate in collected interactions
Annotated answerability categories to distinguish unanswerability causes
Scalable reference-free evaluation method for responses
Abstract
Large Language Models (LLMs) voice assistants are commonly built as cascaded Automatic Speech recognition (ASR) to LLM systems, where recognition errors can distort user intent. Dislikes may also arise from ambiguous, out-of-domain, or non-request turns, making it hard to isolate ASR effects. We release WASIL (it denotes connection or linking in Arabic): in-the-wild Arabic spoken interaction prompts with audio, ASR hypotheses, assistant responses, and explicit like/dislike feedback (8,529 turns; 14.2% dislikes), plus a 2,000-turn test set covering Modern Standard Arabic (MSA) and four major dialects with their labels. We provide low-cost gold transcripts via multi-ASR agreement-guided post-editing and annotate answerability (answerable, ambiguous/needs-clarification, unsupported, not-a-request/noise) to separate intrinsic unanswerability from ASR-induced degradation. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
