Deep word embeddings for visual speech recognition

Themos Stafylakis; Georgios Tzimiropoulos

arXiv:1710.11201·cs.CV·November 1, 2017

Deep word embeddings for visual speech recognition

Themos Stafylakis, Georgios Tzimiropoulos

PDF

1 Repo

TL;DR

This paper introduces a deep learning architecture that generates word embeddings from visual speech data, effectively recognizing words while reducing variability from speaker, pose, and illumination, and enabling recognition of unseen words.

Contribution

The paper presents a novel deep architecture for visual speech recognition that produces effective word embeddings and demonstrates zero-shot recognition capabilities.

Findings

01

Achieved 11.92% error rate on 500-word closed-set recognition

02

Embeddings effectively model unseen words in low-shot learning scenarios

03

System surpasses previous state-of-the-art in visual speech recognition

Abstract

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition. The embeddings summarize the information of the mouth region that is relevant to the problem of word recognition, while suppressing other types of variability such as speaker, pose and illumination. The system is comprised of a spatiotemporal convolutional layer, a Residual Network and bidirectional LSTMs and is trained on the Lipreading in-the-wild database. We first show that the proposed architecture goes beyond state-of-the-art on closed-set word identification, by attaining 11.92% error rate on a vocabulary of 500 words. We then examine the capacity of the embeddings in modelling words unseen during training. We deploy Probabilistic Linear Discriminant Analysis (PLDA) to model the embeddings and perform low-shot learning experiments on words unseen during training.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tstafylakis/Lipreading-ResNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.