Automatic context window composition for distant speech recognition
Mirco Ravanelli, Maurizio Omologo

TL;DR
This paper introduces an automatic method for designing asymmetric context windows in deep neural networks for distant speech recognition, improving performance especially in reverberant environments by reducing redundancy.
Contribution
It proposes a gradient-based automatic approach for optimizing context window composition, enhancing DNN training efficiency and recognition accuracy in challenging acoustic conditions.
Findings
Automatic context window design improves recognition accuracy.
Reduced redundancy in frame configuration enhances training efficiency.
Method is effective across various acoustic environments and DNN architectures.
Abstract
Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems. A key aspect behind the rapid rise and success of DNNs is their ability to better manage large time contexts. With this regard, asymmetric context windows that embed more past than future frames have been recently used with feed-forward neural networks. This context configuration turns out to be useful not only to address low-latency speech recognition, but also to boost the recognition performance under reverberant conditions. This paper investigates on the mechanisms occurring inside DNNs, which lead to an effective application of asymmetric contexts.In particular, we propose a novel method for automatic context window composition based on a gradient analysis. The experiments, performed with different acoustic environments, features, DNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
