Successes and critical failures of neural networks in capturing   human-like speech recognition

Federico Adolfi; Jeffrey S. Bowers; David Poeppel

arXiv:2204.03740·cs.SD·April 20, 2023·1 cites

Successes and critical failures of neural networks in capturing human-like speech recognition

Federico Adolfi, Jeffrey S. Bowers, David Poeppel

PDF

Open Access

TL;DR

This paper evaluates how well neural networks mimic human speech recognition robustness, revealing both similarities and critical failures, and suggests new directions for improving artificial auditory systems.

Contribution

It systematically compares neural network performance to human speech perception, identifying where models succeed and fail in capturing human-like robustness.

Findings

01

Neural networks replicate some human perceptual phenomena.

02

Models show robustness at certain spectrotemporal granularities.

03

All models fail to recover speech perceptually where humans do.

Abstract

Natural and artificial audition can in principle acquire different solutions to a given problem. The constraints of the task, however, can nudge the cognitive science and engineering of audition to qualitatively converge, suggesting that a closer mutual examination would potentially enrich artificial hearing systems and process models of the mind and brain. Speech recognition - an area ripe for such exploration - is inherently robust in humans to a number transformations at various spectrotemporal granularities. To what extent are these robustness profiles accounted for by high-performing neural network systems? We bring together experiments in speech recognition under a single synthesis framework to evaluate state-of-the-art neural networks as stimulus-computable, optimized observers. In a series of experiments, we (1) clarify how influential speech manipulations in the literature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis