Mono-hydra: Real-time 3D scene graph construction from monocular camera input with IMU
U.V.B.L. Udugama, G. Vosselman, F. Nex

TL;DR
Mono-Hydra is a real-time system that constructs 3D scene graphs from monocular camera input combined with IMU data, enabling efficient indoor navigation and environment understanding for robots.
Contribution
It introduces a novel real-time 3D scene graph construction method using monocular vision and IMU, with deep learning and visual-inertial odometry, adaptable to indoor and outdoor scenarios.
Findings
Achieves sub-20 cm accuracy at 15 fps in real-time
Uses deep learning for depth and semantic extraction
Provides publicly available code for reproducibility
Abstract
The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth. This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
