Three-Dimensional Lip Motion Network for Text-Independent Speaker   Recognition

Jianrong Wang; Tong Wu; Shanyu Wang; Mei Yu; Qiang Fang; and Ju Zhang; Li Liu

arXiv:2010.06363·cs.CV·October 14, 2020

Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition

Jianrong Wang, Tong Wu, Shanyu Wang, Mei Yu, Qiang Fang, and Ju Zhang, Li Liu

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel 3D lip motion network (3LMNet) for text-independent speaker recognition, leveraging 3D lip motion features and a regional feedback module to improve robustness over 2D methods across varied face orientations.

Contribution

The work proposes a new end-to-end 3D lip motion network with a regional feedback module and prior lip motion knowledge integration for improved speaker recognition.

Findings

01

3LMNet outperforms baseline models like LSTM, VGG-16, ResNet-34.

02

It surpasses state-of-the-art 2D lip image methods.

03

Pre-processing techniques enhance dataset quality.

Abstract

Lip motion reflects behavior characteristics of speakers, and thus can be used as a new kind of biometrics in speaker recognition. In the literature, lots of works used two-dimensional (2D) lip images to recognize speaker in a textdependent context. However, 2D lip easily suffers from various face orientations. To this end, in this work, we present a novel end-to-end 3D lip motion Network (3LMNet) by utilizing the sentence-level 3D lip motion (S3DLM) to recognize speakers in both the text-independent and text-dependent contexts. A new regional feedback module (RFM) is proposed to obtain attentions in different lip regions. Besides, prior knowledge of lip motion is investigated to complement RFM, where landmark-level and frame-level features are merged to form a better feature representation. Moreover, we present two methods, i.e., coordinate transformation and face posture correction to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory