PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection

Seokyeong Lee; Sithu Aung; Junyong Choi; Seungryong Kim; Ig-Jae Kim; Junghyun Cho

arXiv:2507.02393·cs.CV·July 8, 2025

PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection

Seokyeong Lee, Sithu Aung, Junyong Choi, Seungryong Kim, Ig-Jae Kim, Junghyun Cho

PDF

TL;DR

This paper introduces PLOT, a pseudo-labeling framework using video object tracking to improve monocular 3D object detection without extra sensors or multi-view setups, enhancing robustness and scalability.

Contribution

The paper presents a novel pseudo-labeling approach leveraging video data and object tracking, enabling 3D attribute extraction without additional sensors or multi-view configurations.

Findings

01

Achieves reliable accuracy in monocular 3D detection

02

Demonstrates strong scalability across datasets

03

Operates effectively without multi-view or extra sensors

Abstract

Monocular 3D object detection (M3OD) has long faced challenges due to data scarcity caused by high annotation costs and inherent 2D-to-3D ambiguity. Although various weakly supervised methods and pseudo-labeling methods have been proposed to address these issues, they are mostly limited by domain-specific learning or rely solely on shape information from a single observation. In this paper, we propose a novel pseudo-labeling framework that uses only video data and is more robust to occlusion, without requiring a multi-view setup, additional sensors, camera poses, or domain-specific training. Specifically, we explore a technique for aggregating the pseudo-LiDARs of both static and dynamic objects across temporally adjacent frames using object point tracking, enabling 3D attribute extraction in scenarios where 3D data acquisition is infeasible. Extensive experiments demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.