MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
Jiaqi Yang, Yucong Chen, Xiangting Meng, Chenxin Yan, Min Li, Ran, Cheng, Lige Liu, Tao Sun, Laurent Kneip

TL;DR
This paper introduces MV-ROPE, a multi-view framework that leverages RGB video streams, SLAM, and pose graph optimization to estimate object pose and size accurately without relying on high-quality depth sensors.
Contribution
The paper presents a novel multi-view approach combining SLAM, a lightweight pose predictor, and pose graph optimization for robust category-level object pose and size estimation from RGB videos.
Findings
Achieves comparable accuracy to RGB-D methods with high-quality depth data.
Outperforms previous RGB-based methods in scenarios with limited or no depth information.
Demonstrates robustness and accuracy across multiple datasets with varying depth quality.
Abstract
Recently there has been a growing interest in category-level object pose and size estimation, and prevailing methods commonly rely on single view RGB-D images. However, one disadvantage of such methods is that they require accurate depth maps which cannot be produced by consumer-grade sensors. Furthermore, many practical real-world situations involve a moving camera that continuously observes its surroundings, and the temporal information of the input video streams is simply overlooked by single-view methods. We propose a novel solution that makes use of RGB video streams. Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph optimizer. The SLAM module utilizes a video stream and additional scale-sensitive readings to estimate camera poses and metric depth. The object pose predictor then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
