# End-to-End Visual Speech Recognition for Small-Scale Datasets

**Authors:** Stavros Petridis, Yujiang Wang, Pingchuan Ma, Zuwei Li, Maja Pantic

arXiv: 1904.01954 · 2019-07-10

## TL;DR

This paper introduces an end-to-end visual speech recognition system designed for small datasets, using fully-connected layers and LSTMs to jointly learn features and classification, outperforming previous methods.

## Contribution

The authors propose a novel end-to-end model with dual streams and BLSTM fusion tailored for small-scale datasets, addressing limitations of data-hungry approaches.

## Key findings

- Achieved up to 11.4% accuracy improvement on multiple datasets.
- Demonstrated effectiveness of joint feature and classifier learning.
- Validated suitability for small-scale visual speech recognition tasks.

## Abstract

Visual speech recognition models traditionally consist of two stages, feature extraction and classification. Several deep learning approaches have been recently presented aiming to replace the feature extraction stage by automatically extracting features from mouth images. However, research on joint learning of features and classification remains limited. In addition, most of the existing methods require large amounts of data in order to achieve state-of-the-art performance, otherwise they under-perform. In this work, we present an end-to-end visual speech recognition system based on fully-connected layers and Long-Short Memory (LSTM) networks which is suitable for small-scale datasets. The model consists of two streams which extract features directly from the mouth and difference images, respectively. The temporal dynamics in each stream are modelled by a Bidirectional LSTM (BLSTM) and the fusion of the two streams takes place via another BLSTM. An absolute improvement of 0.6%, 3.4%, 3.9%, 11.4% over the state-of-the-art is reported on the OuluVS2, CUAVE, AVLetters and AVLetters2 databases, respectively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.01954/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.01954/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1904.01954/full.md

---
Source: https://tomesphere.com/paper/1904.01954