Moravec's Paradox: Towards an Auditory Turing Test
David Noever, Forrest McKee

TL;DR
This paper introduces an auditory Turing test revealing that current AI models perform poorly on complex auditory tasks, exposing fundamental gaps in machine listening abilities compared to humans.
Contribution
It presents a new benchmark with 917 challenges across diverse auditory categories to evaluate AI systems' human-like listening capabilities.
Findings
AI models have over 93% failure rate on auditory challenges
Even the best model achieves only 6.9% accuracy
Current architectures lack mechanisms for human-like auditory scene analysis
Abstract
This research work demonstrates that current AI systems fail catastrophically on auditory tasks that humans perform effortlessly. Drawing inspiration from Moravec's paradox (i.e., tasks simple for humans often prove difficult for machines, and vice versa), we introduce an auditory Turing test comprising 917 challenges across seven categories: overlapping speech, speech in noise, temporal distortion, spatial audio, coffee-shop noise, phone distortion, and perceptual illusions. Our evaluation of state-of-the-art audio models including GPT-4's audio capabilities and OpenAI's Whisper reveals a striking failure rate exceeding 93%, with even the best-performing model achieving only 6.9% accuracy on tasks that humans solved at 7.5 times higher success (52%). These results expose focusing failures in how AI systems process complex auditory scenes, particularly in selective attention, noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
