Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Ga\v{s}per Begu\v{s}, Alan Zhou

TL;DR
This paper introduces a visualization method for intermediate layers of unsupervised deep convolutional networks trained on speech data, enabling interpretation of learned features and acoustic properties without labeled data.
Contribution
It presents a novel technique to visualize and interpret convolutional layers in unsupervised acoustic word classification models, using averaging and non-linear regression.
Findings
Effective visualization of convolutional layers with informative time-series data
Ability to infer and analyze underlying word distributions at different layers
Facilitates hypothesis testing on acoustic and phonetic properties
Abstract
Understanding how deep convolutional neural networks classify data has been subject to extensive research. This paper proposes a technique to visualize and interpret intermediate layers of unsupervised deep convolutional networks by averaging over individual feature maps in each convolutional layer and inferring underlying distributions of words with non-linear regression techniques. A GAN-based architecture (ciwGAN arXiv:2006.02951) that includes a Generator, a Discriminator, and a classifier was trained on unlabeled sliced lexical items from TIMIT. The training process results in a deep convolutional network that learns to classify words into discrete classes only from the requirement of the Generator to output informative data. This classifier network has no access to the training data -- only to the generated data. We propose a technique to visualize individual convolutional layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
