Designing Practical Models for Isolated Word Visual Speech Recognition

Iason Ioannis Panagos; Giorgos Sfikas; Christophoros Nikou

arXiv:2508.17894·cs.CV·August 26, 2025

Designing Practical Models for Isolated Word Visual Speech Recognition

Iason Ioannis Panagos, Giorgos Sfikas, Christophoros Nikou

PDF

TL;DR

This paper develops lightweight, resource-efficient visual speech recognition models that maintain high accuracy, enabling practical deployment in resource-constrained environments by benchmarking and adapting efficient neural network architectures.

Contribution

It introduces novel low-resource VSR architectures based on efficient image classification models and lightweight temporal convolution blocks, addressing hardware cost issues.

Findings

01

Achieved strong recognition performance with low-resource models.

02

Demonstrated effectiveness on a large English word database.

03

Models are suitable for practical, resource-constrained applications.

Abstract

Visual speech recognition (VSR) systems decode spoken words from an input sequence using only the video data. Practical applications of such systems include medical assistance as well as human-machine interactions. A VSR system is typically employed in a complementary role in cases where the audio is corrupt or not available. In order to accurately predict the spoken words, these architectures often rely on deep neural networks in order to extract meaningful representations from the input sequence. While deep architectures achieve impressive recognition performance, relying on such models incurs significant computation costs which translates into increased resource demands in terms of hardware requirements and results in limited applicability in real-world scenarios where resources might be constrained. This factor prevents wider adoption and deployment of speech recognition systems in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.