Personalized Lip Reading: Adapting to Your Unique Lip Movements with   Vision and Language

Jeong Hun Yeo; Chae Won Kim; Hyunjun Kim; Hyeongseop Rha; Seunghee; Han; Wen-Huang Cheng; Yong Man Ro

arXiv:2409.00986·cs.CV·January 3, 2025

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee, Han, Wen-Huang Cheng, Yong Man Ro

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel speaker-adaptive lip reading method that adapts to individual speakers at both visual and language levels, utilizing prompt tuning and LoRA, validated on a new diverse dataset for real-world sentence-level lip reading.

Contribution

It proposes a new speaker adaptation approach combining prompt tuning and LoRA, and introduces VoxLRS-SA, a large, diverse dataset for real-world, sentence-level lip reading.

Findings

01

Existing methods improve with speaker adaptation in the wild.

02

The proposed method outperforms previous approaches.

03

VoxLRS-SA enables validation of lip reading in diverse real-world scenarios.

Abstract

Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information such as lip appearances. To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality. However, the effectiveness of adapting language information, such as vocabulary choice, of the target speaker has not been explored in previous works. Additionally, existing datasets for speaker adaptation have limited vocabulary sizes and pose variations, which restrict the validation of previous speaker-adaptive methods in real-world scenarios. To address these issues, we propose a novel speaker-adaptive lip reading method that adapts a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeonghun0716/personalized-lip-reading
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media