Forecasting of depth and ego-motion with transformers and   self-supervision

Houssem Boulahbal; Adrian Voicila; Andrew Comport

arXiv:2206.07435·cs.CV·June 16, 2022

Forecasting of depth and ego-motion with transformers and self-supervision

Houssem Boulahbal, Adrian Voicila, Andrew Comport

PDF

Open Access

TL;DR

This paper introduces a self-supervised method combining CNNs and transformers to forecast depth and ego-motion from raw image sequences, achieving competitive results without requiring annotated data.

Contribution

It proposes a novel architecture that leverages both convolutional and transformer modules for self-supervised depth and ego-motion forecasting from raw images.

Findings

01

Performs well on KITTI benchmark

02

Achieves results comparable to supervised methods

03

Uses only raw images without annotations

Abstract

This paper addresses the problem of end-to-end self-supervised forecasting of depth and ego motion. Given a sequence of raw images, the aim is to forecast both the geometry and ego-motion using a self supervised photometric loss. The architecture is designed using both convolution and transformer modules. This leverages the benefits of both modules: Inductive bias of CNN, and the multi-head attention of transformers, thus enabling a rich spatio-temporal representation that enables accurate depth forecasting. Prior work attempts to solve this problem using multi-modal input/output with supervised ground-truth data which is not practical since a large annotated dataset is required. Alternatively to prior methods, this paper forecasts depth and ego motion using only self-supervised raw images as input. The approach performs significantly well on the KITTI dataset benchmark with several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image Processing Techniques and Applications

MethodsSoftmax · Linear Layer · Convolution