R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid   Landmarks Encoding and Progressive Multilayer Conditioning

Zhiling Ye; LiangGuo Zhang; Dingheng Zeng; Quan Lu; Ning; Jiang

arXiv:2312.05572·cs.CV·December 12, 2023·1 cites

R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning

Zhiling Ye, LiangGuo Zhang, Dingheng Zeng, Quan Lu, Ning, Jiang

PDF

Open Access

TL;DR

R2-Talker is a real-time talking head synthesis framework that uses hash grid landmarks encoding and progressive multilayer conditioning to improve visual quality and efficiency in 3D portrait generation.

Contribution

It introduces a novel hash grid landmark encoding and multilayer conditioning scheme for NeRF-based talking head synthesis, enhancing quality and efficiency.

Findings

01

Superior visual quality compared to state-of-the-art methods

02

Enhanced generalizability due to decoupled input and conditional spaces

03

Significant computational efficiency improvements

Abstract

Dynamic NeRFs have recently garnered growing attention for 3D talking portrait synthesis. Despite advances in rendering speed and visual quality, challenges persist in enhancing efficiency and effectiveness. We present R2-Talker, an efficient and effective framework enabling realistic real-time talking head synthesis. Specifically, using multi-resolution hash grids, we introduce a novel approach for encoding facial landmarks as conditional features. This approach losslessly encodes landmark structures as conditional features, decoupling input diversity, and conditional spaces by mapping arbitrary landmarks to a unified feature space. We further propose a scheme of progressive multilayer conditioning in the NeRF rendering pipeline for effective conditional feature fusion. Our new approach has the following advantages as demonstrated by extensive experiments compared with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings