Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization

Fangyu Lin; Yingdong Hu; Zhening Liu; Yufan Zhuang; Zehong Lin; Jun Zhang

arXiv:2601.07518·cs.CV·January 13, 2026

Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization

Fangyu Lin, Yingdong Hu, Zhening Liu, Yufan Zhuang, Zehong Lin, Jun Zhang

PDF

Open Access

TL;DR

Mon3tr introduces a monocular 3D telepresence system that creates lifelike avatars using Gaussian splatting, enabling real-time, bandwidth-efficient remote interaction on mobile devices.

Contribution

It pioneers integrating 3D Gaussian splatting with monocular input and amortized computation for efficient, high-quality telepresence avatars.

Findings

01

Achieves > 28 dB PSNR for novel poses

02

Operates at ~ 80 ms latency

03

Reduces bandwidth by over 1000x compared to point-cloud streaming

Abstract

Immersive telepresence aims to transform human interaction in AR/VR applications by enabling lifelike full-body holographic representations for enhanced remote collaboration. However, existing systems rely on hardware-intensive multi-camera setups and demand high bandwidth for volumetric streaming, limiting their real-time performance on mobile devices. To overcome these challenges, we propose Mon3tr, a novel Monocular 3D telepresence framework that integrates 3D Gaussian splatting (3DGS) based parametric human modeling into telepresence for the first time. Mon3tr adopts an amortized computation strategy, dividing the process into a one-time offline multi-view reconstruction phase to build a user-specific avatar and a monocular online inference phase during live telepresence sessions. A single monocular RGB camera is used to capture body motions and facial expressions in real time to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Virtual Reality Applications and Impacts · 3D Shape Modeling and Analysis