LRWR: Large-Scale Benchmark for Lip Reading in Russian language
Evgeniy Egorov, Vasily Kostyumov, Mikhail Konyk, Sergey Kolesnikov

TL;DR
This paper introduces LRWR, a large-scale Russian lipreading dataset, and evaluates current methods, highlighting language-specific challenges and achieving new state-of-the-art results.
Contribution
The paper presents the first large-scale Russian lipreading benchmark and provides a comprehensive analysis of existing methods on this dataset.
Findings
Current lipreading methods perform differently across languages.
The LRWR dataset reveals language-specific challenges in lipreading.
State-of-the-art results were achieved on LRW benchmark.
Abstract
Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of proper datasets for a wide variety of languages: so far, these methods have been focused only on English or Chinese. In this paper, we introduce a naturally distributed large-scale benchmark for lipreading in Russian language, named LRWR, which contains 235 classes and 135 speakers. We provide a detailed description of the dataset collection pipeline and dataset statistics. We also present a comprehensive comparison of the current popular lipreading methods on LRWR and conduct a detailed analysis of their performance. The results demonstrate the differences between the benchmarked languages and provide several promising directions for lipreading models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Video Analysis and Summarization
