Lip-reading with Densely Connected Temporal Convolutional Networks

Pingchuan Ma; Yujiang Wang; Jie Shen; Stavros Petridis; Maja Pantic

arXiv:2009.14233·cs.CV·September 30, 2022·6 cites

Lip-reading with Densely Connected Temporal Convolutional Networks

Pingchuan Ma, Yujiang Wang, Jie Shen, Stavros Petridis, Maja Pantic

PDF

Open Access 1 Repo

TL;DR

This paper introduces DC-TCN, a novel densely connected temporal convolutional network with attention mechanisms, achieving state-of-the-art lip-reading accuracy on LRW and LRW-1000 datasets.

Contribution

The paper proposes a new densely connected TCN with attention blocks for improved lip-reading, surpassing existing methods on benchmark datasets.

Findings

01

Achieved 88.36% accuracy on LRW dataset.

02

Achieved 43.65% accuracy on LRW-1000 dataset.

03

Surpassed all baseline methods, setting new state-of-the-art results.

Abstract

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words. Although Temporal Convolutional Networks (TCN) have recently demonstrated great potential in many vision tasks, its receptive fields are not dense enough to model the complex temporal dynamics in lip-reading scenarios. To address this problem, we introduce dense connections into the network to capture more robust temporal features. Moreover, our approach utilises the Squeeze-and-Excitation block, a light-weight attention mechanism, to further enhance the model's classification power. Without bells and whistles, our DC-TCN method has achieved 88.36% accuracy on the Lip Reading in the Wild (LRW) dataset and 43.65% on the LRW-1000 dataset, which has surpassed all the baseline methods and is the new state-of-the-art on both datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mpc001/Lipreading_using_Temporal_Convolutional_Networks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Face recognition and analysis

MethodsDense Connections