VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection
Chupeng Liu, Jiyong Rao, Shangquan Sun, Runkai Zhao, Weidong Cai

TL;DR
VirPro introduces a novel multi-modal pretraining approach that uses adaptive, scene-aware prompts with visual uncertainties to improve weakly-supervised monocular 3D detection, significantly boosting performance on the KITTI benchmark.
Contribution
The paper proposes VirPro, a new probabilistic prompt learning framework that integrates visual features into textual prompts, enhancing weakly-supervised monocular 3D detection.
Findings
Achieves up to 4.8% AP improvement on KITTI benchmark.
Effectively models visual uncertainties with Multi-Gaussian Prompt Modeling.
Enhances semantic coherence through modality alignment with contrastive learning.
Abstract
Monocular 3D object detection typically relies on pseudo-labeling techniques to reduce dependency on real-world annotations. Recent advances demonstrate that deterministic linguistic cues can serve as effective auxiliary weak supervision signals, providing complementary semantic context. However, hand-crafted textual descriptions struggle to capture the inherent visual diversity of individuals across scenes, limiting the model's ability to learn scene-aware representations. To address this challenge, we propose Visual-referred Probabilistic Prompt Learning (VirPro), an adaptive multi-modal pretraining paradigm that can be seamlessly integrated into diverse weakly supervised monocular 3D detection frameworks. Specifically, we generate a diverse set of learnable, instance-conditioned prompts across scenes and store them in an Adaptive Prompt Bank (APB). Subsequently, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
