Transformer-Based Attention Networks for Continuous Pixel-Wise   Prediction

Guanglei Yang; Hao Tang; Mingli Ding; Nicu Sebe; Elisa Ricci

arXiv:2103.12091·cs.CV·August 6, 2021·6 cites

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

Guanglei Yang, Hao Tang, Mingli Ding, Nicu Sebe, Elisa Ricci

PDF

Open Access 1 Repo

TL;DR

This paper introduces TransDepth, a novel architecture combining CNNs and transformers with a gated attention decoder, achieving state-of-the-art results in continuous pixel-wise prediction tasks like depth and surface normal estimation.

Contribution

It is the first to apply transformers to pixel-wise continuous label prediction, integrating a gated attention decoder to preserve local details.

Findings

01

Achieves state-of-the-art performance on three datasets

02

Effectively models long-range dependencies in pixel-wise tasks

03

Demonstrates the benefit of combining CNNs and transformers

Abstract

While convolutional neural networks have shown a tremendous impact on various computer vision tasks, they generally demonstrate limitations in explicitly modeling long-range dependencies due to the intrinsic locality of the convolution operation. Initially designed for natural language processing tasks, Transformers have emerged as alternative architectures with innate global self-attention mechanisms to capture long-range dependencies. In this paper, we propose TransDepth, an architecture that benefits from both convolutional neural networks and transformers. To avoid the network losing its ability to capture local-level details due to the adoption of transformers, we propose a novel decoder that employs attention mechanisms based on gates. Notably, this is the first paper that applies transformers to pixel-wise prediction problems involving continuous labels (i.e., monocular depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ygjwd12345/TransDepth
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsConvolution