Mono-Hydra++: Real-Time Monocular Scene Graph Construction with Multi-Task Learning for 3D Indoor Mapping

U. V. B. L. Udugama; George Vosselman; Francesco Nex

arXiv:2605.17661·cs.RO·May 19, 2026

Mono-Hydra++: Real-Time Monocular Scene Graph Construction with Multi-Task Learning for 3D Indoor Mapping

U. V. B. L. Udugama, George Vosselman, Francesco Nex

PDF

1 Models

TL;DR

Mono-Hydra++ is a real-time monocular RGB-IMU system that constructs 3D indoor scene graphs, enabling semantic understanding for lightweight robots without active depth sensors.

Contribution

It introduces a novel multi-task deep model and a pipeline for real-time semantic mapping and scene graph construction using only monocular RGB and IMU data.

Findings

01

Achieves 1.6% lower trajectory error than RGB-D baseline on ScanNet

02

Improves average ATE by 29.8% over calibrated baselines on 7-Scenes

03

Runs at 25.53 FPS on Jetson Orin NX with embedded perception model

Abstract

Autonomous agile robots need more than metric geometry: they must understand objects, rooms, places, and spatial relations for search, inspection, exploration, and human robot interaction. Conventional metric maps support localization and collision avoidance, but do not provide this semantic and relational structure. 3D scene graphs address this gap by connecting geometry with object level and room level understanding. Building such representations on agile platforms remains difficult because aerial and lightweight robots operate under strict payload, power, and compute limits, making RGB-D cameras and LiDAR sensors impractical for many onboard settings. We present Mono-Hydra++, a real time monocular RGB plus IMU pipeline for indoor metric semantic mapping and hierarchical 3D scene graph construction. The system combines M2H-MX, a DINOv3 based multi-task model for depth and semantics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Bavantha11/m2h-mx
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.