# Learning Waveform-Based Acoustic Models using Deep Variational   Convolutional Neural Networks

**Authors:** Dino Oglic, Zoran Cvetkovic, Peter Sollich

arXiv: 1906.09526 · 2021-08-17

## TL;DR

This paper introduces a stochastic deep convolutional neural network for waveform-based acoustic modeling in speech recognition, leveraging variational inference and adaptive filters to improve robustness and performance over existing methods.

## Contribution

It proposes a novel waveform-based acoustic model using deep variational CNNs with adaptive parametric filters and an effective approximation for variational inference, enhancing robustness.

## Key findings

- Outperforms comparable waveform-based baselines.
- Achieves better results than standard FBANK feature-based models.
- Demonstrates robustness improvements in speech recognition.

## Abstract

We investigate the potential of stochastic neural networks for learning effective waveform-based acoustic models. The waveform-based setting, inherent to fully end-to-end speech recognition systems, is motivated by several comparative studies of automatic and human speech recognition that associate standard non-adaptive feature extraction techniques with information loss which can adversely affect robustness. Stochastic neural networks, on the other hand, are a class of models capable of incorporating rich regularization mechanisms into the learning process. We consider a deep convolutional neural network that first decomposes speech into frequency sub-bands via an adaptive parametric convolutional block where filters are specified by cosine modulations of compactly supported windows. The network then employs standard non-parametric 1D convolutions to extract relevant spectro-temporal patterns while gradually compressing the structured high dimensional representation generated by the parametric block. We rely on a probabilistic parametrization of the proposed neural architecture and learn the model using stochastic variational inference. This requires evaluation of an analytically intractable integral defining the Kullback-Leibler divergence term responsible for regularization, for which we propose an effective approximation based on the Gauss-Hermite quadrature. Our empirical results demonstrate a superior performance of the proposed approach over comparable waveform-based baselines and indicate that it could lead to robustness. Moreover, the approach outperforms a recently proposed deep convolutional neural network for learning of robust acoustic models with standard FBANK features.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.09526/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.09526/full.md

## References

86 references — full list in the complete paper: https://tomesphere.com/paper/1906.09526/full.md

---
Source: https://tomesphere.com/paper/1906.09526