Chain-based Discriminative Autoencoders for Speech Recognition
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang

TL;DR
This paper introduces chain-based discriminative autoencoders (c-DcAE) and their hierarchical and parallel variants for speech recognition, improving robustness and accuracy by combining reconstruction and phonetic embedding objectives.
Contribution
It proposes new chain-based and hierarchical/parallel autoencoder models that enhance speech recognition by integrating mutual information and multi-task learning.
Findings
c-DcAE outperforms baseline systems on WSJ and Aurora-4 datasets.
Hierarchical and parallel structures improve robustness to noisy speech.
Mutual information-based objective enhances phonetic embedding quality.
Abstract
In our previous work, we proposed a discriminative autoencoder (DcAE) for speech recognition. DcAE combines two training schemes into one. First, since DcAE aims to learn encoder-decoder mappings, the squared error between the reconstructed speech and the input speech is minimized. Second, in the code layer, frame-based phonetic embeddings are obtained by minimizing the categorical cross-entropy between ground truth labels and predicted triphone-state scores. DcAE is developed based on the Kaldi toolkit by treating various TDNN models as encoders. In this paper, we further propose three new versions of DcAE. First, a new objective function that considers both categorical cross-entropy and mutual information between ground truth and predicted triphone-state sequences is used. The resulting DcAE is called a chain-based DcAE (c-DcAE). For application to robust speech recognition, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
