Twin Regularization for online speech recognition

Mirco Ravanelli; Dmitriy Serdyuk; Yoshua Bengio

arXiv:1804.05374·eess.AS·June 13, 2018

Twin Regularization for online speech recognition

Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

PDF

2 Repos

TL;DR

This paper introduces a novel regularization technique for unidirectional RNNs in online speech recognition, encouraging hidden states to encode future information without increasing test-time computation.

Contribution

It proposes a twin regularization method that aligns forward and backward hidden states, improving robustness in real-time speech recognition without added inference costs.

Findings

01

Effective across multiple datasets and architectures

02

No additional computation during testing

03

Improves robustness in online speech recognition

Abstract

Online speech recognition is crucial for developing natural human-machine interfaces. This modality, however, is significantly more challenging than off-line ASR, since real-time/low-latency constraints inevitably hinder the use of future information, that is known to be very helpful to perform robust predictions. A popular solution to mitigate this issue consists of feeding neural acoustic models with context windows that gather some future frames. This introduces a latency which depends on the number of employed look-ahead features. This paper explores a different approach, based on estimating the future rather than waiting for it. Our technique encourages the hidden representations of a unidirectional recurrent network to embed some useful information about the future. Inspired by a recently proposed technique called Twin Networks, we add a regularization term that forces forward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.