SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection

Hao Vo; Khoa Vo; Thinh Phan; Ngo Xuan Cuong; Gianfranco Doretto; Hien Nguyen; Anh Nguyen; Ngan Le

arXiv:2604.18476·cs.CV·April 21, 2026

SemLT3D: Semantic-Guided Expert Distillation for Camera-only Long-Tailed 3D Object Detection

Hao Vo, Khoa Vo, Thinh Phan, Ngo Xuan Cuong, Gianfranco Doretto, Hien Nguyen, Anh Nguyen, Ngan Le

PDF

TL;DR

SemLT3D introduces a semantic-guided expert distillation framework that enhances long-tail 3D object detection from camera data by leveraging semantic priors and CLIP-informed features.

Contribution

It proposes a novel semantic-guided expert routing and distillation approach to improve recognition of rare classes in camera-only 3D detection.

Findings

01

Improves detection accuracy for underrepresented classes.

02

Enhances robustness to appearance variations and challenging cases.

03

Leverages semantic priors and CLIP features for better feature discrimination.

Abstract

Camera-only 3D object detection has emerged as a cost-effective and scalable alternative to LiDAR for autonomous driving, yet existing methods primarily prioritize overall performance while overlooking the severe long-tail imbalance inherent in real-world datasets. In practice, many rare but safety-critical categories such as children, strollers, or emergency vehicles are heavily underrepresented, leading to biased learning and degraded performance. This challenge is further exacerbated by pronounced inter-class ambiguity (e.g., visually similar subclasses) and substantial intra-class diversity (e.g., objects varying widely in appearance, scale, pose, or context), which together hinder reliable long-tail recognition. In this work, we introduce SemLT3D, a Semantic-Guided Expert Distillation framework designed to enrich the representation space for underrepresented classes through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.