Moravec's Paradox: Towards an Auditory Turing Test

David Noever; Forrest McKee

arXiv:2507.23091·cs.AI·August 1, 2025

Moravec's Paradox: Towards an Auditory Turing Test

David Noever, Forrest McKee

PDF

TL;DR

This paper introduces an auditory Turing test revealing that current AI models perform poorly on complex auditory tasks, exposing fundamental gaps in machine listening abilities compared to humans.

Contribution

It presents a new benchmark with 917 challenges across diverse auditory categories to evaluate AI systems' human-like listening capabilities.

Findings

01

AI models have over 93% failure rate on auditory challenges

02

Even the best model achieves only 6.9% accuracy

03

Current architectures lack mechanisms for human-like auditory scene analysis

Abstract

This research work demonstrates that current AI systems fail catastrophically on auditory tasks that humans perform effortlessly. Drawing inspiration from Moravec's paradox (i.e., tasks simple for humans often prove difficult for machines, and vice versa), we introduce an auditory Turing test comprising 917 challenges across seven categories: overlapping speech, speech in noise, temporal distortion, spatial audio, coffee-shop noise, phone distortion, and perceptual illusions. Our evaluation of state-of-the-art audio models including GPT-4's audio capabilities and OpenAI's Whisper reveals a striking failure rate exceeding 93%, with even the best-performing model achieving only 6.9% accuracy on tasks that humans solved at 7.5 times higher success (52%). These results expose focusing failures in how AI systems process complex auditory scenes, particularly in selective attention, noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.