Evaluation of End-to-End Continuous Spanish Lipreading in Different Data   Conditions

David Gimeno-G\'omez; Carlos-D. Mart\'inez-Hinarejos

arXiv:2502.00464·cs.CV·February 18, 2025

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

David Gimeno-G\'omez, Carlos-D. Mart\'inez-Hinarejos

PDF

1 Repo

TL;DR

This paper advances Spanish lipreading by developing an end-to-end hybrid CTC/Attention system, achieving state-of-the-art results across two datasets, and providing a new benchmark with comprehensive analysis.

Contribution

It introduces a novel end-to-end Spanish lipreading system, evaluates it on multiple datasets, and establishes a new benchmark with detailed ablation and error analysis.

Findings

01

State-of-the-art results on two Spanish lipreading datasets

02

Significant performance improvements over previous methods

03

Comprehensive analysis of architecture components and error factors

Abstract

Visual speech recognition remains an open research problem where different challenges must be considered by dispensing with the auditory sense, such as visual ambiguities, the inter-personal variability among speakers, and the complex modeling of silence. Nonetheless, recent remarkable results have been achieved in the field thanks to the availability of large-scale databases and the use of powerful attention mechanisms. Besides, multiple languages apart from English are nowadays a focus of interest. This paper presents noticeable advances in automatic continuous lipreading for Spanish. First, an end-to-end system based on the hybrid CTC/Attention architecture is presented. Experiments are conducted on two corpora of disparate nature, reaching state-of-the-art results that significantly improve the best performance obtained to date for both databases. In addition, a thorough ablation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

david-gimeno/evaluating-end2end-spanish-lipreading
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus