RTFormer: Efficient Design for Real-Time Semantic Segmentation with   Transformer

Jian Wang; Chenhui Gou; Qiman Wu; Haocheng Feng; Junyu Han; Errui; Ding; Jingdong Wang

arXiv:2210.07124·cs.CV·October 14, 2022·76 cites

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Jian Wang, Chenhui Gou, Qiman Wu, Haocheng Feng, Junyu Han, Errui, Ding, Jingdong Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

RTFormer is a novel dual-resolution transformer designed for real-time semantic segmentation, offering a better balance of accuracy and efficiency than traditional CNN models by utilizing GPU-friendly attention and cross-resolution strategies.

Contribution

The paper introduces RTFormer, an efficient transformer architecture that reduces computational complexity and improves real-time segmentation performance compared to existing CNN-based methods.

Findings

01

Achieves state-of-the-art results on Cityscapes, CamVid, and COCOStuff benchmarks.

02

Demonstrates high inference efficiency on GPU-like devices.

03

Outperforms CNN-based models in the trade-off between accuracy and speed.

Abstract

Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transformer. We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation, which achieves better trade-off between performance and efficiency than CNN-based models. To achieve high inference efficiency on GPU-like devices, our RTFormer leverages GPU-Friendly Attention with linear complexity and discards the multi-head mechanism. Besides, we find that cross-resolution attention is more efficient to gather global context information for high-resolution branch by spreading the high level knowledge learned from low-resolution branch. Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PaddlePaddle/PaddleSeg
paddleOfficial

Videos

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition