CAG-Avatar: Cross-Attention Guided Gaussian Avatars for High-Fidelity Head Reconstruction
Zhe Chang, Haodong Jin, Yan Song, Hui Yu

TL;DR
CAG-Avatar introduces a cross-attention based adaptive framework for 3D head avatars, significantly improving local detail modeling and reconstruction fidelity in real-time digital animation.
Contribution
It proposes a novel Conditionally Adaptive Fusion Module that enables Gaussian primitives to adaptively respond to expression signals, enhancing facial region dynamics modeling.
Findings
Improved reconstruction fidelity, especially for teeth and detailed regions.
Maintains real-time rendering performance.
Outperforms existing methods in visual quality and detail accuracy.
Abstract
Creating high-fidelity, real-time drivable 3D head avatars is a core challenge in digital animation. While 3D Gaussian Splashing (3D-GS) offers unprecedented rendering speed and quality, current animation techniques often rely on a "one-size-fits-all" global tuning approach, where all Gaussian primitives are uniformly driven by a single expression code. This simplistic approach fails to unravel the distinct dynamics of different facial regions, such as deformable skin versus rigid teeth, leading to significant blurring and distortion artifacts. We introduce Conditionally-Adaptive Gaussian Avatars (CAG-Avatar), a framework that resolves this key limitation. At its core is a Conditionally Adaptive Fusion Module built on cross-attention. This mechanism empowers each 3D Gaussian to act as a query, adaptively extracting relevant driving signals from the global expression code based on its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
