TL;DR
SalientDSO introduces a novel method that integrates visual saliency and scene parsing into direct sparse visual odometry, significantly improving robustness and accuracy in cluttered indoor scenes by focusing on salient features.
Contribution
This paper is the first to incorporate deep learning-based visual saliency and scene parsing into direct sparse VO, enhancing feature selection and robustness.
Findings
Outperforms DSO and ORB-SLAM on standard datasets
Achieves accurate VO with as few as 40 features per frame
Provides a new dataset with indoor cluttered sequences
Abstract
Although cluttered indoor scenes have a lot of useful high-level semantic information which can be used for mapping and localization, most Visual Odometry (VO) algorithms rely on the usage of geometric features such as points, lines and planes. Lately, driven by this idea, the joint optimization of semantic labels and obtaining odometry has gained popularity in the robotics community. The joint optimization is good for accurate results but is generally very slow. At the same time, in the vision community, direct and sparse approaches for VO have stricken the right balance between speed and accuracy. We merge the successes of these two communities and present a way to incorporate semantic information in the form of visual saliency to Direct Sparse Odometry - a highly successful direct sparse VO algorithm. We also present a framework to filter the visual saliency based on scene parsing.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
