Lip Reading Using Convolutional Auto Encoders as Feature Extractor

Dharin Parekh; Ankitesh Gupta; Shharrnam Chhatpar; Anmol Yash Kumar,; Manasi Kulkarni

arXiv:1805.12371·cs.CV·June 1, 2018

Lip Reading Using Convolutional Auto Encoders as Feature Extractor

Dharin Parekh, Ankitesh Gupta, Shharrnam Chhatpar, Anmol Yash Kumar,, Manasi Kulkarni

PDF

TL;DR

This paper introduces a novel lip-reading model using convolutional autoencoders for feature extraction combined with LSTM, achieving superior accuracy on standard datasets and breaking existing benchmarks.

Contribution

The paper presents a new word-level lip-reading model that employs convolutional autoencoders for feature extraction, outperforming existing models on benchmark datasets.

Findings

01

Achieved 98% accuracy on MIRACL-VC1, surpassing 93.4% benchmark.

02

Performed better than baseline CNN and LSTM models on BBC's LRW dataset.

03

Demonstrated the effectiveness of autoencoder features in lip-reading tasks.

Abstract

Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks the set benchmarks for standard datasets. In our model we use convolutional autoencoders as feature extractors which are then fed to a Long short-term memory model. We tested our proposed model on BBC's LRW dataset, MIRACL-VC1 and GRID dataset. Achieving a classification accuracy of 98% on MIRACL-VC1 as compared to 93.4% of the set benchmark (Rekik et al., 2014). On BBC's LRW the proposed model performed better than the baseline model of convolutional neural networks and Long short-term memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.