LCANet: End-to-End Lipreading with Cascaded Attention-CTC
Kai Xu, Dawei Li, Nick Cassimatis, Xiaolong Wang

TL;DR
LCANet is an end-to-end lipreading system that combines CNN, highway, and bidirectional GRU encoders with a cascaded attention-CTC decoder, significantly improving accuracy on benchmark datasets.
Contribution
The paper introduces LCANet, a novel deep neural network architecture with a cascaded attention-CTC decoder that enhances lipreading performance and convergence speed.
Findings
Achieves 1.3% CER and 3.0% WER on GRID corpus
Improves state-of-the-art results by 12.3%
Effectively captures spatio-temporal information in videos
Abstract
Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data. In this paper, we propose LCANet, an end-to-end deep neural network based lipreading system. LCANet encodes input video frames using a stacked 3D convolutional neural network (CNN), highway network and bidirectional GRU network. The encoder effectively captures both short-term and long-term spatio-temporal information. More importantly, LCANet incorporates a cascaded attention-CTC decoder to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Hand Gesture Recognition Systems
MethodsSigmoid Activation · Highway Layer · Highway Network · Gated Recurrent Unit
