LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

David Gimeno-G\'omez; Carlos-D. Mart\'inez-Hinarejos

arXiv:2311.12457·cs.CV·November 22, 2023·5 cites

LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

David Gimeno-G\'omez, Carlos-D. Mart\'inez-Hinarejos

PDF

Open Access 1 Repo

TL;DR

This paper introduces LIP-RTVE, a large-scale audiovisual Spanish database with semi-automatic annotations, enabling improved research in audiovisual speech recognition and multimodal speech processing.

Contribution

The paper presents a new extensive audiovisual Spanish database with semi-automatic annotations, filling a resource gap for non-English speech recognition research.

Findings

01

Baseline results with Hidden Markov Models demonstrate the database's utility.

02

The database covers 13 hours of natural Spanish speech from TV.

03

It supports both speaker-dependent and speaker-independent scenarios.

Abstract

Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and visual cues are combined to represent the nature of speech. In addition, Visual Speech Recognition, an open research problem whose purpose is to interpret speech by reading the lips of the speaker, has been a focus of interest in the last decades. Nevertheless, in order to estimate these systems in the currently Deep Learning era, large-scale databases are required. On the other hand, while most of these databases are dedicated to English, other languages lack sufficient resources. Thus, this paper presents a semi-automatically annotated audiovisual database to deal with unconstrained natural Spanish, providing 13 hours of data extracted from Spanish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

david-gimeno/lip-rtve
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Music and Audio Processing

MethodsFocus