Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

Junxuan Li; Rawal Khirodkar; Chengan He; Zhongshi Jiang; Giljoo Nam; Lingchen Yang; Jihyun Lee; Egor Zakharov; Zhaoen Su; Rinat Abdrashitov; Yuan Dong; Julieta Martinez; Kai Li; Qingyang Tan; Takaaki Shiratori; Matthew Hu; Peihong Guo; Xuhua Huang; Ariyan Zarei; Marco Pesavento; Yichen Xu; He Wen; Teng Deng; Wyatt Borsos; Anjali Thakrar; Jean-Charles Bazin; Carsten Stoll; Gin\'es Hidalgo; James Booth; Lucy Wang; Xiaowen Ma; Yu Rong; Sairanjith Thalanki; Chen Cao; Christian H\"ane; Abhishek Kar; Sofien Bouaziz; Jason Saragih; Yaser Sheikh; Shunsuke Saito

arXiv:2604.02320·cs.CV·April 8, 2026

Large-scale Codec Avatars: The Unreasonable Effectiveness of Large-scale Avatar Pretraining

Junxuan Li, Rawal Khirodkar, Chengan He, Zhongshi Jiang, Giljoo Nam, Lingchen Yang, Jihyun Lee, Egor Zakharov, Zhaoen Su, Rinat Abdrashitov, Yuan Dong, Julieta Martinez, Kai Li, Qingyang Tan, Takaaki Shiratori, Matthew Hu, Peihong Guo, Xuhua Huang, Ariyan Zarei, Marco Pesavento

PDF

TL;DR

This paper introduces Large-Scale Codec Avatars (LCA), a high-fidelity 3D avatar model trained on large-scale in-the-wild data and high-quality curated data, achieving broad generalization and detailed control.

Contribution

It presents a novel pre/post-training paradigm for 3D avatar modeling at scale, combining large-scale in-the-wild pretraining with targeted fine-tuning for high quality.

Findings

01

LCA generalizes across diverse appearances and demographics.

02

LCA exhibits emergent capabilities like relightability and loose garment support.

03

LCA shows zero-shot robustness to stylized imagery.

Abstract

High-quality 3D avatar modeling faces a critical trade-off between fidelity and generalization. On the one hand, multi-view studio data enables high-fidelity modeling of humans with precise control over expressions and poses, but it struggles to generalize to real-world data due to limited scale and the domain gap between the studio environment and the real world. On the other hand, recent large-scale avatar models trained on millions of in-the-wild samples show promise for generalization across a wide range of identities, yet the resulting avatars are often of low-quality due to inherent 3D ambiguities. To address this, we present Large-Scale Codec Avatars (LCA), a high-fidelity, full-body 3D avatar model that generalizes to world-scale populations in a feedforward manner, enabling efficient inference. Inspired by the success of large language models and vision foundation models, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.