# SGANVO: Unsupervised Deep Visual Odometry and Depth Estimation with   Stacked Generative Adversarial Networks

**Authors:** Tuo Feng, Dongbing Gu

arXiv: 1906.08889 · 2019-07-23

## TL;DR

SGANVO introduces a novel unsupervised deep learning framework using stacked GANs for improved visual depth and ego-motion estimation, demonstrating superior or comparable results on the KITTI dataset.

## Contribution

The paper presents a new stacked GAN architecture that jointly estimates depth and ego-motion, incorporating temporal dynamics for enhanced accuracy.

## Key findings

- Achieves better or comparable depth estimation results on KITTI dataset.
- Demonstrates improved ego-motion estimation accuracy.
- Effectively captures temporal dynamics in visual odometry.

## Abstract

Recently end-to-end unsupervised deep learning methods have achieved an effect beyond geometric methods for visual depth and ego-motion estimation tasks. These data-based learning methods perform more robustly and accurately in some of the challenging scenes. The encoder-decoder network has been widely used in the depth estimation and the RCNN has brought significant improvements in the ego-motion estimation. Furthermore, the latest use of Generative Adversarial Nets(GANs) in depth and ego-motion estimation has demonstrated that the estimation could be further improved by generating pictures in the game learning process. This paper proposes a novel unsupervised network system for visual depth and ego-motion estimation: Stacked Generative Adversarial Network(SGANVO). It consists of a stack of GAN layers, of which the lowest layer estimates the depth and ego-motion while the higher layers estimate the spatial features. It can also capture the temporal dynamic due to the use of a recurrent representation across the layers. See Fig.1 for details. We select the most commonly used KITTI [1] data set for evaluation. The evaluation results show that our proposed method can produce better or comparable results in depth and ego-motion estimation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08889/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08889/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1906.08889/full.md

---
Source: https://tomesphere.com/paper/1906.08889