LRW-Persian: Lip-reading in the Wild Dataset for Persian Language

Zahra Taghizadeh; Mohammad Shahverdikondori; Arian Noori; Alireza Dadgarnia

arXiv:2510.22716·cs.CV·October 28, 2025

LRW-Persian: Lip-reading in the Wild Dataset for Persian Language

Zahra Taghizadeh, Mohammad Shahverdikondori, Arian Noori, Alireza Dadgarnia

PDF

TL;DR

LRW-Persian is the largest in-the-wild Persian lipreading dataset, enabling research in visual speech recognition for underrepresented languages through extensive data, automated curation, and baseline benchmarks.

Contribution

It introduces the first large-scale Persian lipreading dataset with comprehensive metadata, automated quality control, and baseline models for the language.

Findings

01

Established baseline lipreading performance on LRW-Persian.

02

Demonstrated the dataset's difficulty for current architectures.

03

Enabled cross-lingual transfer research in visual speech recognition.

Abstract

Lipreading has emerged as an increasingly important research area for developing robust speech recognition systems and assistive technologies for the hearing-impaired. However, non-English resources for visual speech recognition remain limited. We introduce LRW-Persian, the largest in-the-wild Persian word-level lipreading dataset, comprising $743$ target words and over $414, 000$ video samples extracted from more than $1, 900$ hours of footage across $67$ television programs. Designed as a benchmark-ready resource, LRW-Persian provides speaker-disjoint training and test splits, wide regional and dialectal coverage, and rich per-clip metadata including head pose, age, and gender. To ensure large-scale data quality, we establish a fully automated end-to-end curation pipeline encompassing transcription based on Automatic Speech Recognition(ASR), active-speaker localization, quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.