Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees
Yuanchao Wang, Wenji Du, Chenghao Cai, Yanyan Xu

TL;DR
This paper uses decision trees to analyze the attention mechanism in end-to-end speech recognition, revealing that attention is mainly influenced by previous states and struggles with long-term dependencies.
Contribution
It introduces a decision tree-based approach to interpret the attention mechanism, providing new insights into its behavior in speech recognition systems.
Findings
Attention is mainly influenced by previous states
Default attention favors closer states
Poor modeling of long-term dependencies
Abstract
The attention mechanism has largely improved the performance of end-to-end speech recognition systems. However, the underlying behaviours of attention is not yet clearer. In this study, we use decision trees to explain how the attention mechanism impact itself in speech recognition. The results indicate that attention levels are largely impacted by their previous states rather than the encoder and decoder patterns. Additionally, the default attention mechanism seems to put more weights on closer states, but behaves poorly on modelling long-term dependencies of attention states.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Neural Networks and Applications
