R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning
Zhiling Ye, LiangGuo Zhang, Dingheng Zeng, Quan Lu, Ning, Jiang

TL;DR
R2-Talker is a real-time talking head synthesis framework that uses hash grid landmarks encoding and progressive multilayer conditioning to improve visual quality and efficiency in 3D portrait generation.
Contribution
It introduces a novel hash grid landmark encoding and multilayer conditioning scheme for NeRF-based talking head synthesis, enhancing quality and efficiency.
Findings
Superior visual quality compared to state-of-the-art methods
Enhanced generalizability due to decoupled input and conditional spaces
Significant computational efficiency improvements
Abstract
Dynamic NeRFs have recently garnered growing attention for 3D talking portrait synthesis. Despite advances in rendering speed and visual quality, challenges persist in enhancing efficiency and effectiveness. We present R2-Talker, an efficient and effective framework enabling realistic real-time talking head synthesis. Specifically, using multi-resolution hash grids, we introduce a novel approach for encoding facial landmarks as conditional features. This approach losslessly encodes landmark structures as conditional features, decoupling input diversity, and conditional spaces by mapping arbitrary landmarks to a unified feature space. We further propose a scheme of progressive multilayer conditioning in the NeRF rendering pipeline for effective conditional feature fusion. Our new approach has the following advantages as demonstrated by extensive experiments compared with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
