From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
Chenjian Gao, Lihe Ding, Rui Han, Zhanpeng Huang, Zibin Wang, Tianfan Xue

TL;DR
This paper presents a hybrid approach combining 3D Gaussian Splatting and 2D diffusion models to insert realistic, temporally consistent bracelets into videos, addressing challenges in lighting, motion, and perspective changes.
Contribution
It introduces a novel pipeline that synergizes 3D rendering and 2D diffusion for photorealistic, temporally coherent video object insertion, specifically for bracelets.
Findings
Enhanced temporal consistency in video insertion
Improved photorealistic lighting effects
First to combine 3D Gaussian Splatting with diffusion models for this task
Abstract
Inserting 3D objects into videos is a longstanding challenge in computer graphics with applications in augmented reality, virtual try-on, and video composition. Achieving both temporal consistency, or realistic lighting remains difficult, particularly in dynamic scenarios with complex object motion, perspective changes, and varying illumination. While 2D diffusion models have shown promise for producing photorealistic edits, they often struggle with maintaining temporal coherence across frames. Conversely, traditional 3D rendering methods excel in spatial and temporal consistency but fall short in achieving photorealistic lighting. In this work, we propose a hybrid object insertion pipeline that combines the strengths of both paradigms. Specifically, we focus on inserting bracelets into dynamic wrist scenes, leveraging the high temporal consistency of 3D Gaussian Splatting (3DGS) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
