ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality
Mingzhi Zhu, Ding Shang, Sai Qian Zhang

TL;DR
ESCA introduces a comprehensive framework combining algorithm and hardware optimizations to enable real-time, high-quality Codec Avatar rendering on resource-limited VR devices, enhancing immersive communication.
Contribution
The paper presents a novel full-stack optimization approach with a tailored quantization method and custom hardware accelerator for efficient PCA inference in VR.
Findings
Up to 0.39 improvement in FovVideoVDP scores
3.36x reduction in latency
Achieves 100 fps rendering rate in end-to-end tests
Abstract
Photorealistic Codec Avatars (PCA), which generate high-fidelity human face renderings, are increasingly being used in Virtual Reality (VR) environments to enable immersive communication and interaction through deep learning-based generative models. However, these models impose significant computational demands, making real-time inference challenging on resource-constrained VR devices such as head-mounted displays, where latency and power efficiency are critical. To address this challenge, we propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality. In addition, we design a custom hardware accelerator that can be integrated into the system-on-chip of VR devices to further enhance processing efficiency. Building on these components, we introduce ESCA, a full-stack optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
