Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael, J. Jones, Pedro Miraldo, Erik Learned-Miller

TL;DR
This paper introduces a novel, efficient method for estimating camera rotation in crowded scenes from monocular video, achieving high accuracy and robustness where previous methods struggled, supported by a new dataset and benchmark.
Contribution
We propose a new generalization of the Hough transform on SO(3) for robust camera rotation estimation in crowded scenes, outperforming existing methods in accuracy and speed.
Findings
Our method reduces rotation estimation error by nearly 50% compared to the next best approach.
It is more accurate than existing methods regardless of computational speed.
The approach is effective in crowded, real-world scenes, as demonstrated on a new dataset.
Abstract
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other datasets, we provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences. Methods developed for wide baseline stereo (e.g., 5-point methods) perform poorly on monocular video. On the other hand, methods used in autonomous driving (e.g., SLAM) leverage specific sensor setups, specific motion models, or local optimization strategies (lagging batch processing) and do not generalize well to handheld video. Finally, for dynamic scenes, commonly used robustification techniques like RANSAC require large numbers of iterations, and become…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · Image and Object Detection Techniques · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
