Objects as Spatio-Temporal 2.5D points

Paridhi Singh; Gaurav Singh; Arun Kumar

arXiv:2212.02755·cs.CV·December 8, 2022

Objects as Spatio-Temporal 2.5D points

Paridhi Singh, Gaurav Singh, Arun Kumar

PDF

Open Access

TL;DR

This paper introduces a lightweight, weakly supervised method for estimating 3D object positions in bird's eye view by jointly learning 2D detections and scene depth, without requiring 3D annotations.

Contribution

It extends a single-shot detector to model objects as spatio-temporal BEV points using only 2D supervision and LiDAR during training, eliminating the need for 3D annotations.

Findings

01

Achieves comparable accuracy to state-of-the-art methods on KITTI benchmark.

02

Over 10x computational efficiency compared to recent approaches.

03

Effectively models object tracks as BEV points without 3D annotations.

Abstract

Determining accurate bird's eye view (BEV) positions of objects and tracks in a scene is vital for various perception tasks including object interactions mapping, scenario extraction etc., however, the level of supervision required to accomplish that is extremely challenging to procure. We propose a light-weight, weakly supervised method to estimate 3D position of objects by jointly learning to regress the 2D object detections and scene's depth prediction in a single feed-forward pass of a network. Our proposed method extends a center-point based single-shot object detector, and introduces a novel object representation where each object is modeled as a BEV point spatio-temporally, without the need of any 3D or BEV annotations for training and LiDAR data at query time. The approach leverages readily available 2D object supervision along with LiDAR point clouds (used only during training)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Marine animal studies overview