SalientDSO: Bringing Attention to Direct Sparse Odometry

Huai-Jen Liang; Nitin J. Sanket; Cornelia Ferm\"uller; Yiannis; Aloimonos

arXiv:1803.00127·cs.CV·March 2, 2018

SalientDSO: Bringing Attention to Direct Sparse Odometry

Huai-Jen Liang, Nitin J. Sanket, Cornelia Ferm\"uller, Yiannis, Aloimonos

PDF

1 Repo

TL;DR

SalientDSO introduces a novel method that integrates visual saliency and scene parsing into direct sparse visual odometry, significantly improving robustness and accuracy in cluttered indoor scenes by focusing on salient features.

Contribution

This paper is the first to incorporate deep learning-based visual saliency and scene parsing into direct sparse VO, enhancing feature selection and robustness.

Findings

01

Outperforms DSO and ORB-SLAM on standard datasets

02

Achieves accurate VO with as few as 40 features per frame

03

Provides a new dataset with indoor cluttered sequences

Abstract

Although cluttered indoor scenes have a lot of useful high-level semantic information which can be used for mapping and localization, most Visual Odometry (VO) algorithms rely on the usage of geometric features such as points, lines and planes. Lately, driven by this idea, the joint optimization of semantic labels and obtaining odometry has gained popularity in the robotics community. The joint optimization is good for accurate results but is generally very slow. At the same time, in the vision community, direct and sparse approaches for VO have stricken the right balance between speed and accuracy. We merge the successes of these two communities and present a way to incorporate semantic information in the form of visual saliency to Direct Sparse Odometry - a highly successful direct sparse VO algorithm. We also present a framework to filter the visual saliency based on scene parsing.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prgumd/SalientDSO
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings