M2H-MX: Multi-Task Dense Visual Perception for Real-Time Monocular Spatial Understanding

U.V.B.L. Udugama; George Vosselman; Francesco Nex

arXiv:2603.29236·cs.CV·April 1, 2026

M2H-MX: Multi-Task Dense Visual Perception for Real-Time Monocular Spatial Understanding

U.V.B.L. Udugama, George Vosselman, Francesco Nex

PDF

1 Models

TL;DR

This paper introduces M2H-MX, a real-time multi-task perception model that enhances monocular spatial understanding by integrating dense depth and semantic predictions into SLAM systems, achieving state-of-the-art accuracy and improved mapping.

Contribution

The paper presents M2H-MX, a lightweight multi-task model with novel global context and cross-task interaction mechanisms for real-time monocular perception.

Findings

01

M2H-MX-L achieves state-of-the-art semantic mIoU on NYUDv2.

02

Reduces depth RMSE by 9.4% on NYUDv2.

03

Decreases trajectory error by 60.7% in real-time monocular mapping on ScanNet.

Abstract

Monocular cameras are attractive for robotic perception due to their low cost and ease of deployment, yet achieving reliable real-time spatial understanding from a single image stream remains challenging. While recent multi-task dense prediction models have improved per-pixel depth and semantic estimation, translating these advances into stable monocular mapping systems is still non-trivial. This paper presents M2H-MX, a real-time multi-task perception model for monocular spatial understanding. The model preserves multi-scale feature representations while introducing register-gated global context and controlled cross-task interaction in a lightweight decoder, enabling depth and semantic predictions to reinforce each other under strict latency constraints. Its outputs integrate directly into an unmodified monocular SLAM pipeline through a compact perception-to-mapping interface. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Bavantha11/m2h-mx
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.