LRW-Persian: Lip-reading in the Wild Dataset for Persian Language
Zahra Taghizadeh, Mohammad Shahverdikondori, Arian Noori, Alireza Dadgarnia

TL;DR
LRW-Persian is the largest in-the-wild Persian lipreading dataset, enabling research in visual speech recognition for underrepresented languages through extensive data, automated curation, and baseline benchmarks.
Contribution
It introduces the first large-scale Persian lipreading dataset with comprehensive metadata, automated quality control, and baseline models for the language.
Findings
Established baseline lipreading performance on LRW-Persian.
Demonstrated the dataset's difficulty for current architectures.
Enabled cross-lingual transfer research in visual speech recognition.
Abstract
Lipreading has emerged as an increasingly important research area for developing robust speech recognition systems and assistive technologies for the hearing-impaired. However, non-English resources for visual speech recognition remain limited. We introduce LRW-Persian, the largest in-the-wild Persian word-level lipreading dataset, comprising target words and over video samples extracted from more than hours of footage across television programs. Designed as a benchmark-ready resource, LRW-Persian provides speaker-disjoint training and test splits, wide regional and dialectal coverage, and rich per-clip metadata including head pose, age, and gender. To ensure large-scale data quality, we establish a fully automated end-to-end curation pipeline encompassing transcription based on Automatic Speech Recognition(ASR), active-speaker localization, quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
