Explaining Spectrograms in Machine Learning: A Study on Neural Networks   for Speech Classification

Jesin James; Balamurali B. T.; Binu Abeysinghe; Junchen Liu

arXiv:2407.17416·eess.AS·July 25, 2024

Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification

Jesin James, Balamurali B. T., Binu Abeysinghe, Junchen Liu

PDF

1 Repo

TL;DR

This paper explores how neural networks classify vowels in spectrograms, revealing the acoustic features they rely on, and compares these with linguistic knowledge to improve speech recognition interpretability.

Contribution

It introduces a method using class activation mapping to interpret neural network decisions in vowel classification, linking neural features to linguistic cues.

Findings

01

Neural networks focus on specific frequency patterns for vowel classification.

02

Identified acoustic cues align with linguistic knowledge of vowels.

03

Insights into misclassification causes improve speech recognition models.

Abstract

This study investigates discriminative patterns learned by neural networks for accurate speech classification, with a specific focus on vowel classification tasks. By examining the activations and features of neural networks for vowel classification, we gain insights into what the networks "see" in spectrograms. Through the use of class activation mapping, we identify the frequencies that contribute to vowel classification and compare these findings with linguistic knowledge. Experiments on a American English dataset of vowels showcases the explainability of neural networks and provides valuable insights into the causes of misclassifications and their characteristics when differentiating them from unvoiced speech. This study not only enhances our understanding of the underlying acoustic cues in vowel classification but also offers opportunities for improving speech recognition by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maorienglish-codeswitch/vowel_classification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Focus