Analyzing Phonetic and Graphemic Representations in End-to-End Automatic   Speech Recognition

Yonatan Belinkov; Ahmed Ali; James Glass

arXiv:1907.04224·cs.CL·April 21, 2020

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

Yonatan Belinkov, Ahmed Ali, James Glass

PDF

1 Repo

TL;DR

This paper investigates how end-to-end neural speech recognition models internally represent phonetic and graphemic information across different layers, comparing two languages and multiple datasets to understand their learned features.

Contribution

It provides a detailed analysis of internal representations in end-to-end ASR models, highlighting how phonetic and graphemic information is encoded across layers and languages.

Findings

01

Consistent layer-wise representation of phonetic and graphemic features across datasets.

02

Differences in representation quality between English and Arabic.

03

Insights into the interpretability of end-to-end neural ASR models.

Abstract

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions. In contrast to modular ASR systems, which contain separately-trained components for acoustic modeling, pronunciation lexicon, and language modeling, the end-to-end paradigm is both conceptually simpler and has the potential benefit of training the entire system on the end task. However, such neural network models are more opaque: it is not clear how to interpret the role of different parts of the network and what information it learns during training. In this paper, we analyze the learned internal representations in an end-to-end ASR model. We evaluate the representation quality in terms of several classification tasks, comparing phonemes and graphemes, as well as different articulatory features. We study two languages (English and Arabic) and three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boknilev/asr-repr-analysis
torchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.