
TL;DR
This paper presents a method to analyze human and object motion in videos from a single camera to infer scene depth, lighting, and occlusion, enabling realistic object insertion with minimal manual effort.
Contribution
It introduces an automated approach for scene understanding and object compositing that accurately models lighting, shadows, and occlusion from monocular video data.
Findings
Effective depth and lighting inference from monocular video
Automated realistic object insertion with proper occlusion and shadows
Comparison showing improved realism over existing methods
Abstract
By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism. In particular, when a user places a new object (2D cut-out) in the image, it is automatically rescaled, relit, occluded properly, and casts realistic shadows in the correct direction relative to the sun, and which conform properly to scene geometry. We demonstrate results (best viewed in supplementary video) on a range of scenes and compare to alternative methods for depth estimation and shadow compositing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
