# Unsupervised Learning of Depth and Deep Representation for Visual   Odometry from Monocular Videos in a Metric Space

**Authors:** Xiaochuan Yin, Chengju Liu

arXiv: 1908.01367 · 2019-08-06

## TL;DR

This paper introduces DFO, an unsupervised framework for depth estimation and hierarchical feature learning from monocular videos, improving visual odometry by directly estimating camera motion in a metric space.

## Contribution

It proposes a novel direct feature odometry framework that learns hierarchical features without supervision and estimates pose directly, enhancing compatibility with traditional SLAM systems.

## Key findings

- Effective depth and feature learning demonstrated on KITTI dataset
- Direct pose estimation improves scale consistency
- Compatible with existing SLAM pipelines

## Abstract

For ego-motion estimation, the feature representation of the scenes is crucial. Previous methods indicate that both the low-level and semantic feature-based methods can achieve promising results. Therefore, the incorporation of hierarchical feature representation may benefit from both methods. From this perspective, we propose a novel direct feature odometry framework, named DFO, for depth estimation and hierarchical feature representation learning from monocular videos. By exploiting the metric distance, our framework is able to learn the hierarchical feature representation without supervision. The pose is obtained with a coarse-to-fine approach from high-level to low-level features in enlarged feature maps. The pixel-level attention mask can be self-learned to provide the prior information. In contrast to the previous methods, our proposed method calculates the camera motion with a direct method rather than regressing the ego-motion from the pose network. With this approach, the consistency of the scale factor of translation can be constrained. Additionally, the proposed method is thus compatible with the traditional SLAM pipeline. Experiments on the KITTI dataset demonstrate the effectiveness of our method.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.01367/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1908.01367/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1908.01367/full.md

---
Source: https://tomesphere.com/paper/1908.01367