A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized   Semantic Segmentation

Jan-Aike Term\"ohlen; Timo Bartels; Tim Fingscheidt

arXiv:2308.13331·cs.CV·August 28, 2023

A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized Semantic Segmentation

Jan-Aike Term\"ohlen, Timo Bartels, Tim Fingscheidt

PDF

Open Access 1 Repo

TL;DR

This paper introduces ReVT, a re-parameterized vision transformer approach that enhances domain generalization in semantic segmentation, achieving state-of-the-art results with fewer parameters and no extra inference cost.

Contribution

The paper proposes a novel augmentation-driven method using weight averaging of multiple models in ReVT for improved domain-generalized semantic segmentation.

Findings

01

Achieves 47.3% mIoU on small models, surpassing prior 46.3%.

02

Achieves 50.1% mIoU on midsized models, surpassing prior 47.8%.

03

Requires fewer parameters and maintains high inference speed.

Abstract

The task of semantic segmentation requires a model to assign semantic labels to each pixel of an image. However, the performance of such models degrades when deployed in an unseen domain with different data distributions compared to the training domain. We present a new augmentation-driven approach to domain generalization for semantic segmentation using a re-parameterized vision transformer (ReVT) with weight averaging of multiple models after training. We evaluate our approach on several benchmark datasets and achieve state-of-the-art mIoU performance of 47.3% (prior art: 46.3%) for small models and of 50.1% (prior art: 47.8%) for midsized models on commonly used benchmark datasets. At the same time, our method requires fewer parameters and reaches a higher frame rate than the best prior art. It is also easy to implement and, unlike network ensembles, does not add any computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ifnspaml/revt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Multi-Head Attention · Residual Connection · Vision Transformer