LOD-Net: Locality-Aware 3D Object Detection Using Multi-Scale Transformer Network

Mustaqeem Khan; Aidana Nurakhmetova; Wail Gueaieb; Abdulmotaleb El Saddik

arXiv:2604.16696·cs.CV·April 21, 2026

LOD-Net: Locality-Aware 3D Object Detection Using Multi-Scale Transformer Network

Mustaqeem Khan, Aidana Nurakhmetova, Wail Gueaieb, Abdulmotaleb El Saddik

PDF

TL;DR

This paper introduces LOD-Net, a multi-scale transformer-based approach for 3D object detection in point clouds, enhancing local and global feature capture to improve detection accuracy.

Contribution

It proposes a novel Multi-Scale Attention mechanism integrated into 3DETR, with an upsampling strategy that improves detection of small and semantically related objects.

Findings

01

Achieves nearly 1% improvement in mAP@25 on ScanNetv2

02

Gains 4.78% in mAP@50 over baseline

03

Highlights the importance of adaptive upsampling for lightweight models

Abstract

3D object detection in point cloud data remains a challenging task due to the sparsity and lack of global structure inherent in the input. In this work, we propose a novel Multi-Scale Attention (MSA) mechanism integrated into the 3DETR architecture to better capture both local geometry and global context. Our method introduces an upsampling operation that generates high-resolution feature maps, enabling the network to better detect smaller and semantically related objects. Experiments conducted on the ScanNetv2 dataset demonstrate that our 3DETR + MSA model improves detection performance, achieving a gain of almost 1% in mAP@25 and 4.78% in mAP@50 over the baseline. While applying MSA to the 3DETR-m variant shows limited improvement, our analysis reveals the importance of adapting the upsampling strategy for lightweight models. These results highlight the effectiveness of combining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.