HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods
Yongyuan Li, Xiuyuan Qin, Chao Liang, Mingqiang Wei

TL;DR
HDTR-Net is a real-time high-definition teeth restoration network designed for arbitrary talking face generation, improving visual quality and synchronization without sacrificing speed, and outperforming current super-resolution methods.
Contribution
The paper introduces HDTR-Net, a universal teeth restoration network that enhances teeth clarity in talking face videos in real-time, maintaining lip sync and temporal consistency.
Findings
Achieves real-time high-definition teeth restoration.
Speeds up inference by 300% compared to state-of-the-art methods.
Maintains lip synchronization and frame coherence.
Abstract
Talking Face Generation (TFG) aims to reconstruct facial movements to achieve high natural lip movements from audio and facial features that are under potential connections. Existing TFG methods have made significant advancements to produce natural and realistic images. However, most work rarely takes visual quality into consideration. It is challenging to ensure lip synchronization while avoiding visual quality degradation in cross-modal generation methods. To address this issue, we propose a universal High-Definition Teeth Restoration Network, dubbed HDTR-Net, for arbitrary TFG methods. HDTR-Net can enhance teeth regions at an extremely fast speed while maintaining synchronization, and temporal consistency. In particular, we propose a Fine-Grained Feature Fusion (FGFF) module to effectively capture fine texture feature information around teeth and surrounding regions, and use these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
