3D Lip Event Detection via Interframe Motion Divergence at Multiple Temporal Resolutions
Jie Zhang, Robert B. Fisher

TL;DR
This paper introduces a novel 3D lip event detection method using interframe motion divergence across multiple temporal resolutions, improving accuracy in identifying lip movements during speech.
Contribution
It proposes a new 3D lip event detection pipeline with a motion divergence measure and multi-temporal-resolution framework, advancing speech analysis technology.
Findings
Achieves state-of-the-art detection performance on S3DFM Dataset.
Effectively detects lip opening and closing events across diverse speaking speeds.
Demonstrates robustness across 100 sequences.
Abstract
The lip is a dominant dynamic facial unit when a person is speaking. Detecting lip events is beneficial to speech analysis and support for the hearing impaired. This paper proposes a 3D lip event detection pipeline that automatically determines the lip events from a 3D speaking lip sequence. We define a motion divergence measure using 3D lip landmarks to quantify the interframe dynamics of a 3D speaking lip. Then, we cast the interframe motion detection in a multi-temporal-resolution framework that allows the detection to be applicable to different speaking speeds. The experiments on the S3DFM Dataset investigate the overall 3D lip dynamics based on the proposed motion divergence. The proposed 3D pipeline is able to detect opening and closing lip events across 100 sequences, achieving a state-of-the-art performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Indoor and Outdoor Localization Technologies
