Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee, Han, Wen-Huang Cheng, Yong Man Ro

TL;DR
This paper introduces a novel speaker-adaptive lip reading method that adapts to individual speakers at both visual and language levels, utilizing prompt tuning and LoRA, validated on a new diverse dataset for real-world sentence-level lip reading.
Contribution
It proposes a new speaker adaptation approach combining prompt tuning and LoRA, and introduces VoxLRS-SA, a large, diverse dataset for real-world, sentence-level lip reading.
Findings
Existing methods improve with speaker adaptation in the wild.
The proposed method outperforms previous approaches.
VoxLRS-SA enables validation of lip reading in diverse real-world scenarios.
Abstract
Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information such as lip appearances. To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality. However, the effectiveness of adapting language information, such as vocabulary choice, of the target speaker has not been explored in previous works. Additionally, existing datasets for speaker adaptation have limited vocabulary sizes and pose variations, which restrict the validation of previous speaker-adaptive methods in real-world scenarios. To address these issues, we propose a novel speaker-adaptive lip reading method that adapts a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media
