TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading

Byung Hoon Lee; Wooseok Shin; Sung Won Han

arXiv:2506.16073·cs.CV·January 6, 2026

TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading

Byung Hoon Lee, Wooseok Shin, Sung Won Han

PDF

TL;DR

TD3Net introduces a novel densely connected multi-dilated convolutional network for lipreading, effectively modeling complex temporal features with fewer parameters, achieving competitive accuracy on large datasets.

Contribution

The paper proposes TD3Net, a new architecture combining dense skip connections and multi-dilated convolutions to enhance temporal modeling in lipreading backend networks.

Findings

01

Achieves comparable or better accuracy than state-of-the-art methods.

02

Uses fewer parameters and lower computational cost.

03

Effectively captures diverse temporal features with preserved continuity.

Abstract

The word-level lipreading approach typically employs a two-stage framework with separate frontend and backend architectures to model dynamic lip movements. Each component has been extensively studied, and in the backend architecture, temporal convolutional networks (TCNs) have been widely adopted in state-of-the-art methods. Recently, dense skip connections have been introduced in TCNs to mitigate the limited density of the receptive field, thereby improving the modeling of complex temporal representations. However, their performance remains constrained owing to potential information loss regarding the continuous nature of lip movements, caused by blind spots in the receptive field. To address this limitation, we propose TD3Net, a temporal densely connected multi-dilated convolutional network that combines dense skip connections and multi-dilated temporal convolutions as the backend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.