LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-G\'omez, Carlos-D. Mart\'inez-Hinarejos

TL;DR
This paper introduces LIP-RTVE, a large-scale audiovisual Spanish database with semi-automatic annotations, enabling improved research in audiovisual speech recognition and multimodal speech processing.
Contribution
The paper presents a new extensive audiovisual Spanish database with semi-automatic annotations, filling a resource gap for non-English speech recognition research.
Findings
Baseline results with Hidden Markov Models demonstrate the database's utility.
The database covers 13 hours of natural Spanish speech from TV.
It supports both speaker-dependent and speaker-independent scenarios.
Abstract
Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and visual cues are combined to represent the nature of speech. In addition, Visual Speech Recognition, an open research problem whose purpose is to interpret speech by reading the lips of the speaker, has been a focus of interest in the last decades. Nevertheless, in order to estimate these systems in the currently Deep Learning era, large-scale databases are required. On the other hand, while most of these databases are dedicated to English, other languages lack sufficient resources. Thus, this paper presents a semi-automatically annotated audiovisual database to deal with unconstrained natural Spanish, providing 13 hours of data extracted from Spanish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Music and Audio Processing
MethodsFocus
