3D Video Object Detection with Learnable Object-Centric Global Optimization
Jiawei He, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang

TL;DR
This paper introduces BA-Det, an end-to-end 3D video object detection method that leverages learnable object-centric global optimization and temporal correspondence learning to improve accuracy and efficiency.
Contribution
It proposes a novel object-centric optimization framework with feature-metric bundle adjustment for 3D video detection, addressing moving object challenges.
Findings
Achieves state-of-the-art results on Waymo Open Dataset
Demonstrates effectiveness across multiple baseline detectors
Maintains marginal computational overhead
Abstract
We explore long-term temporal visual correspondence-based optimization for 3D video object detection in this work. Visual correspondence refers to one-to-one mappings for pixels across multiple images. Correspondence-based optimization is the cornerstone for 3D scene reconstruction but is less studied in 3D video object detection, because moving objects violate multi-view geometry constraints and are treated as outliers during scene reconstruction. We address this issue by treating objects as first-class citizens during correspondence-based optimization. In this work, we propose BA-Det, an end-to-end optimizable object detector with object-centric temporal correspondence learning and featuremetric object bundle adjustment. Empirically, we verify the effectiveness and efficiency of BA-Det for multiple baseline 3D detectors under various setups. Our BA-Det achieves SOTA performance on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
