$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains
Soham Ray, Keshav Dhandhania, Victor Barres, and Karthik Narasimhan

TL;DR
The paper introduces $ au$-voice, a comprehensive benchmark for evaluating full-duplex voice agents in complex, real-world scenarios, enabling comparison with text-based systems and identifying key areas for improvement.
Contribution
It presents a novel benchmark framework that assesses voice agents on grounded tasks with realistic audio and interaction dynamics, extending prior benchmarks to real-world conditions.
Findings
GPT-5 achieves 85% task completion in reasoning tasks.
Voice agents reach 31-51% success under clean conditions.
Failures are primarily due to agent behavior, not environment.
Abstract
Full-duplex voice agents--systems that listen and speak simultaneously--are rapidly moving from research to production. However, existing evaluations address conversational dynamics and task completion in isolation. We introduce -voice, a benchmark for evaluating voice agents on grounded tasks with real-world complexity: agents must navigate complex multi-turn conversations, adhere to domain policies, and interact with the environment. The framework extends -bench into a novel voice agent benchmark combining verifiable completion of complex grounded tasks, full-duplex interaction, and realistic audio--enabling direct comparison between voice and text performance. A controllable and realistic voice user simulator provides diverse accents, realistic audio environments, and rich turn-taking dynamics; by decoupling simulation from wall-clock time, the user simulator can use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and dialogue systems
