VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection

Chupeng Liu; Jiyong Rao; Shangquan Sun; Runkai Zhao; Weidong Cai

arXiv:2603.17470·cs.CV·March 23, 2026

VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection

Chupeng Liu, Jiyong Rao, Shangquan Sun, Runkai Zhao, Weidong Cai

PDF

Open Access

TL;DR

VirPro introduces a novel multi-modal pretraining approach that uses adaptive, scene-aware prompts with visual uncertainties to improve weakly-supervised monocular 3D detection, significantly boosting performance on the KITTI benchmark.

Contribution

The paper proposes VirPro, a new probabilistic prompt learning framework that integrates visual features into textual prompts, enhancing weakly-supervised monocular 3D detection.

Findings

01

Achieves up to 4.8% AP improvement on KITTI benchmark.

02

Effectively models visual uncertainties with Multi-Gaussian Prompt Modeling.

03

Enhances semantic coherence through modality alignment with contrastive learning.

Abstract

Monocular 3D object detection typically relies on pseudo-labeling techniques to reduce dependency on real-world annotations. Recent advances demonstrate that deterministic linguistic cues can serve as effective auxiliary weak supervision signals, providing complementary semantic context. However, hand-crafted textual descriptions struggle to capture the inherent visual diversity of individuals across scenes, limiting the model's ability to learn scene-aware representations. To address this challenge, we propose Visual-referred Probabilistic Prompt Learning (VirPro), an adaptive multi-modal pretraining paradigm that can be seamlessly integrated into diverse weakly supervised monocular 3D detection frameworks. Specifically, we generate a diverse set of learnable, instance-conditioned prompts across scenes and store them in an Adaptive Prompt Bank (APB). Subsequently, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis