Visualizing Automatic Speech Recognition -- Means for a Better   Understanding?

Karla Markert; Romain Parracone; Mykhailo Kulakov; Philip; Sperl; Ching-Yu Kao; Konstantin B\"ottinger

arXiv:2202.00673·cs.LG·February 3, 2022

Visualizing Automatic Speech Recognition -- Means for a Better Understanding?

Karla Markert, Romain Parracone, Mykhailo Kulakov, Philip, Sperl, Ching-Yu Kao, Konstantin B\"ottinger

PDF

TL;DR

This paper explores visualization techniques adapted from image recognition to interpret deep neural network-based automatic speech recognition systems, aiming to improve understanding of their decision processes.

Contribution

It introduces the adaptation of attribution methods like LRP, Saliency Maps, and SHAP for audio data to clarify how ASR models process input features.

Findings

01

Visualization techniques reveal influential input features in ASR

02

Comparison of LRP, Saliency, and SHAP shows their relative effectiveness

03

Potential applications include detecting adversarial examples in speech recognition

Abstract

Automatic speech recognition (ASR) is improving ever more at mimicking human speech processing. The functioning of ASR, however, remains to a large extent obfuscated by the complex structure of the deep neural networks (DNNs) they are based on. In this paper, we show how so-called attribution methods, that we import from image recognition and suitably adapt to handle audio data, can help to clarify the working of ASR. Taking DeepSpeech, an end-to-end model for ASR, as a case study, we show how these techniques help to visualize which features of the input are the most influential in determining the output. We focus on three visualization techniques: Layer-wise Relevance Propagation (LRP), Saliency Maps, and Shapley Additive Explanations (SHAP). We compare these methods and discuss potential further applications, such as in the detection of adversarial examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.