HUGS: Human Gaussian Splats
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel,, Anurag Ranjan

TL;DR
HUGS introduces a fast, monocular video-based method for creating animatable 3D human models with scene context using Gaussian splatting, enabling real-time rendering and novel pose synthesis.
Contribution
This work presents Human Gaussian Splats (HUGS), a novel approach that automatically disentangles static scenes and animatable humans from minimal monocular video, using 3D Gaussian splatting with optimized skinning.
Findings
Achieves 60 FPS rendering speed.
Trains in approximately 30 minutes from 50-100 frames.
Outperforms previous methods in quality and speed.
Abstract
Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
