Three-Dimensional Lip Motion Network for Text-Independent Speaker Recognition
Jianrong Wang, Tong Wu, Shanyu Wang, Mei Yu, Qiang Fang, and Ju Zhang, Li Liu

TL;DR
This paper introduces a novel 3D lip motion network (3LMNet) for text-independent speaker recognition, leveraging 3D lip motion features and a regional feedback module to improve robustness over 2D methods across varied face orientations.
Contribution
The work proposes a new end-to-end 3D lip motion network with a regional feedback module and prior lip motion knowledge integration for improved speaker recognition.
Findings
3LMNet outperforms baseline models like LSTM, VGG-16, ResNet-34.
It surpasses state-of-the-art 2D lip image methods.
Pre-processing techniques enhance dataset quality.
Abstract
Lip motion reflects behavior characteristics of speakers, and thus can be used as a new kind of biometrics in speaker recognition. In the literature, lots of works used two-dimensional (2D) lip images to recognize speaker in a textdependent context. However, 2D lip easily suffers from various face orientations. To this end, in this work, we present a novel end-to-end 3D lip motion Network (3LMNet) by utilizing the sentence-level 3D lip motion (S3DLM) to recognize speakers in both the text-independent and text-dependent contexts. A new regional feedback module (RFM) is proposed to obtain attentions in different lip regions. Besides, prior knowledge of lip motion is investigated to complement RFM, where landmark-level and frame-level features are merged to form a better feature representation. Moreover, we present two methods, i.e., coordinate transformation and face posture correction to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- wutong18/Three-Dimensional-Lip-Motion-Network-for-Text-Independent-Speaker-RecognitionpytorchOfficial
- MindCode-4/code-13/tree/main/Three-Dimensional-Lip-Motion-Network-for-Text-Independent-Speaker-Recognition-mastermindspore
- pwc-1/Paper-9/tree/main/3/Three-Dimensional-Lip-Motion-Network-for-Text-Independent-Speaker-Recognition-mastermindspore
- MindCode-4/code-9/tree/main/Three-Dimensional-Lip-Motion-Network-for-Text-Independent-Speaker-Recognition-mastermindspore
- pwc-1/Paper-9/tree/main/4/Three-Dimensional-Lip-Motion-Network-for-Text-Independent-Speaker-Recognition-mastermindspore
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
