WASIL: In-the-Wild Arabic Spoken Interactions with LLMs

Zien Sheikh Ali; Hamdy Mubarak; Soon-Gyo Jung; Hunzalah Hassan Bhatti; Firoj Alam; Shammur Absar Chowdhury

arXiv:2605.16364·cs.SD·May 19, 2026

WASIL: In-the-Wild Arabic Spoken Interactions with LLMs

Zien Sheikh Ali, Hamdy Mubarak, Soon-Gyo Jung, Hunzalah Hassan Bhatti, Firoj Alam, Shammur Absar Chowdhury

PDF

1 Datasets

TL;DR

This paper introduces WASIL, a comprehensive dataset of in-the-wild Arabic spoken interactions with LLMs, including audio, ASR hypotheses, responses, and feedback, to evaluate and improve voice assistant performance across dialects.

Contribution

It provides a large, annotated Arabic spoken interaction dataset with multi-dialect coverage, gold transcripts, answerability labels, and a scalable LLM-based response evaluation method.

Findings

01

14.2% dislike rate in collected interactions

02

Annotated answerability categories to distinguish unanswerability causes

03

Scalable reference-free evaluation method for responses

Abstract

Large Language Models (LLMs) voice assistants are commonly built as cascaded Automatic Speech recognition (ASR) to LLM systems, where recognition errors can distort user intent. Dislikes may also arise from ambiguous, out-of-domain, or non-request turns, making it hard to isolate ASR effects. We release WASIL (it denotes connection or linking in Arabic): in-the-wild Arabic spoken interaction prompts with audio, ASR hypotheses, assistant responses, and explicit like/dislike feedback (8,529 turns; 14.2% dislikes), plus a 2,000-turn test set covering Modern Standard Arabic (MSA) and four major dialects with their labels. We provide low-cost gold transcripts via multi-ASR agreement-guided post-editing and annotate answerability (answerable, ambiguous/needs-clarification, unsupported, not-a-request/noise) to separate intrinsic unanswerability from ASR-induced degradation. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

QCRI/WASIL
dataset· 2.3k dl
2.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.