CAG-Avatar: Cross-Attention Guided Gaussian Avatars for High-Fidelity Head Reconstruction

Zhe Chang; Haodong Jin; Yan Song; Hui Yu

arXiv:2601.14844·cs.GR·January 22, 2026

CAG-Avatar: Cross-Attention Guided Gaussian Avatars for High-Fidelity Head Reconstruction

Zhe Chang, Haodong Jin, Yan Song, Hui Yu

PDF

Open Access

TL;DR

CAG-Avatar introduces a cross-attention based adaptive framework for 3D head avatars, significantly improving local detail modeling and reconstruction fidelity in real-time digital animation.

Contribution

It proposes a novel Conditionally Adaptive Fusion Module that enables Gaussian primitives to adaptively respond to expression signals, enhancing facial region dynamics modeling.

Findings

01

Improved reconstruction fidelity, especially for teeth and detailed regions.

02

Maintains real-time rendering performance.

03

Outperforms existing methods in visual quality and detail accuracy.

Abstract

Creating high-fidelity, real-time drivable 3D head avatars is a core challenge in digital animation. While 3D Gaussian Splashing (3D-GS) offers unprecedented rendering speed and quality, current animation techniques often rely on a "one-size-fits-all" global tuning approach, where all Gaussian primitives are uniformly driven by a single expression code. This simplistic approach fails to unravel the distinct dynamics of different facial regions, such as deformable skin versus rigid teeth, leading to significant blurring and distortion artifacts. We introduce Conditionally-Adaptive Gaussian Avatars (CAG-Avatar), a framework that resolves this key limitation. At its core is a Conditionally Adaptive Fusion Module built on cross-attention. This mechanism empowers each 3D Gaussian to act as a query, adaptively extracting relevant driving signals from the global expression code based on its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis