LMDepth: Lightweight Mamba-based Monocular Depth Estimation for   Real-World Deployment

Jiahuan Long; Xin Zhou

arXiv:2505.00980·cs.CV·May 5, 2025

LMDepth: Lightweight Mamba-based Monocular Depth Estimation for Real-World Deployment

Jiahuan Long, Xin Zhou

PDF

Open Access

TL;DR

LMDepth is a lightweight, efficient monocular depth estimation network that balances high accuracy with low computational cost, suitable for deployment on resource-constrained devices.

Contribution

The paper introduces LMDepth, a novel Mamba-based architecture with a pyramid spatial pooling module and depth Mamba blocks, offering a lightweight alternative to Transformer-based methods.

Findings

01

Outperforms previous lightweight methods on NYUDv2 and KITTI datasets.

02

Achieves higher accuracy with fewer parameters and GFLOPs.

03

Successfully deployed on embedded platform with INT8 quantization.

Abstract

Monocular depth estimation provides an additional depth dimension to RGB images, making it widely applicable in various fields such as virtual reality, autonomous driving and robotic navigation. However, existing depth estimation algorithms often struggle to effectively balance performance and computational efficiency, which poses challenges for deployment on resource-constrained devices. To address this, we propose LMDepth, a lightweight Mamba-based monocular depth estimation network, designed to reconstruct high-precision depth information while maintaining low computational overhead. Specifically, we propose a modified pyramid spatial pooling module that serves as a multi-scale feature aggregator and context extractor, ensuring global spatial information for accurate depth estimation. Moreover, we integrate multiple depth Mamba blocks into the decoder. Designed with linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Advanced Vision and Imaging · Robotic Mechanisms and Dynamics

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces