Instance-aware multi-object self-supervision for monocular depth   prediction

Houssem Boulahbal; Adrian Voicila; Andrew Comport

arXiv:2203.00809·cs.CV·August 10, 2022

Instance-aware multi-object self-supervision for monocular depth prediction

Houssem Boulahbal, Adrian Voicila, Andrew Comport

PDF

TL;DR

This paper introduces a self-supervised monocular depth prediction method that effectively models dynamic objects using transformer-based attention, outperforming existing methods on benchmarks.

Contribution

It presents a novel self-supervised framework incorporating transformer multi-head attention to handle dynamic objects in monocular depth prediction.

Findings

01

Outperforms state-of-the-art methods on standard benchmarks.

02

Effectively models dynamic objects with transformer attention.

03

Competitive with video-to-depth prediction frameworks.

Abstract

This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only 6-DOF camera motion but also 6-DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few SOTA papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Linear Layer