MonoCLUE : Object-Aware Clustering Enhances Monocular 3D Object Detection
Sunghun Yang, Minhyeok Lee, Jungho Lee, Sangyoun Lee

TL;DR
MonoCLUE enhances monocular 3D object detection by combining local clustering of visual features with a generalized scene memory, improving detection accuracy in occluded and truncated scenes, and achieving state-of-the-art results on KITTI.
Contribution
It introduces a novel approach that leverages object-aware clustering and scene memory to improve monocular 3D detection robustness and accuracy.
Findings
Achieves state-of-the-art performance on KITTI benchmark.
Improves detection of partially visible objects.
Enhances robustness in occluded and limited visibility scenarios.
Abstract
Monocular 3D object detection offers a cost-effective solution for autonomous driving but suffers from ill-posed depth and limited field of view. These constraints cause a lack of geometric cues and reduced accuracy in occluded or truncated scenes. While recent approaches incorporate additional depth information to address geometric ambiguity, they overlook the visual cues crucial for robust recognition. We propose MonoCLUE, which enhances monocular 3D detection by leveraging both local clustering and generalized scene memory of visual features. First, we perform K-means clustering on visual features to capture distinct object-level appearance parts (e.g., bonnet, car roof), improving detection of partially visible objects. The clustered features are propagated across regions to capture objects with similar appearances. Second, we construct a generalized scene memory by aggregating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
