Scale-aware Insertion of Virtual Objects in Monocular Videos
Songhai Zhang, Xiangli Li, Yingtian Liu, Hongbo Fu

TL;DR
This paper introduces a scale-aware approach for inserting virtual objects into monocular videos by estimating global scene scale using a Bayesian method and a new object size dataset, improving realism and robustness.
Contribution
It presents a novel Bayesian scale estimation method incorporating object size priors and introduces Metric-Tree, a hierarchical dataset of object sizes for over 900 categories.
Findings
Outperforms state-of-the-art scale estimation methods
Demonstrates robustness across various video scenes
Provides a new dataset for object size priors
Abstract
In this paper, we propose a scale-aware method for inserting virtual objects with proper sizes into monocular videos. To tackle the scale ambiguity problem of geometry recovery from monocular videos, we estimate the global scale objects in a video with a Bayesian approach incorporating the size priors of objects, where the scene objects sizes should strictly conform to the same global scale and the possibilities of global scales are maximized according to the size distribution of object categories. To do so, we propose a dataset of sizes of object categories: Metric-Tree, a hierarchical representation of sizes of more than 900 object categories with the corresponding images. To handle the incompleteness of objects recovered from videos, we propose a novel scale estimation method that extracts plausible dimensions of objects for scale optimization. Experiments have shown that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
