Insights on Neural Representations for End-to-End Speech Recognition
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

TL;DR
This paper investigates the internal layer-wise neural representations of end-to-end speech recognition models using correlation analysis techniques, revealing architecture-specific dynamics that can inform the design of improved models.
Contribution
It introduces the use of CCA and CKA to analyze neural representations in CNN, LSTM, and Transformer ASR models, providing new insights into their internal dynamics during training.
Findings
CNN layers show hierarchical correlation dependencies
LSTM layers exhibit bottom-up correlation patterns
Transformers display irregular correlation patterns
Abstract
End-to-end automatic speech recognition (ASR) models aim to learn a generalised speech representation. However, there are limited tools available to understand the internal functions and the effect of hierarchical dependencies within the model architecture. It is crucial to understand the correlations between the layer-wise representations, to derive insights on the relationship between neural representations and performance. Previous investigations of network similarities using correlation analysis techniques have not been explored for End-to-End ASR models. This paper analyses and explores the internal dynamics between layers during training with CNN, LSTM and Transformer based approaches using Canonical correlation analysis (CCA) and centered kernel alignment (CKA) for the experiments. It was found that neural representations within CNN layers exhibit hierarchical correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Softmax · Sigmoid Activation · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Absolute Position Encodings · Tanh Activation · Long Short-Term Memory
