Robust Monocular SLAM for Egocentric Videos
Suvam Patra, Kartikeya Gupta, Faran Ahmad, Chetan Arora, Subhashis, Banerjee

TL;DR
This paper introduces a robust SLAM method for egocentric videos by reformulating the problem as a batch structure-from-motion task over sliding windows, utilizing rotation and translation averaging to improve stability and accuracy.
Contribution
The authors propose a novel SLAM approach inspired by batch SFM techniques, addressing failures of current methods in egocentric videos through pose initialization and stabilization strategies.
Findings
Successfully handles long, shaky egocentric videos
Outperforms state-of-the-art SLAM techniques on benchmark datasets
Provides both qualitative and quantitative validation
Abstract
Regardless of the tremendous progress, a truly general purpose pipeline for Simultaneous Localization and Mapping (SLAM) remains a challenge. We investigate the reported failure of state of the art (SOTA) SLAM techniques on egocentric videos. We find that the dominant 3D rotations, low parallax between successive frames, and primarily forward motion in egocentric videos are the most common causes of failures. The incremental nature of SOTA SLAM, in the presence of unreliable pose and 3D estimates in egocentric videos, with no opportunities for global loop closures, generates drifts and leads to the eventual failures of such techniques. Taking inspiration from batch mode Structure from Motion (SFM) techniques, we propose to solve SLAM as an SFM problem over the sliding temporal windows. This makes the problem well constrained. Further, we propose to initialize the camera poses using 2D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
