RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
Jian Wang, Chenhui Gou, Qiman Wu, Haocheng Feng, Junyu Han, Errui, Ding, Jingdong Wang

TL;DR
RTFormer is a novel dual-resolution transformer designed for real-time semantic segmentation, offering a better balance of accuracy and efficiency than traditional CNN models by utilizing GPU-friendly attention and cross-resolution strategies.
Contribution
The paper introduces RTFormer, an efficient transformer architecture that reduces computational complexity and improves real-time segmentation performance compared to existing CNN-based methods.
Findings
Achieves state-of-the-art results on Cityscapes, CamVid, and COCOStuff benchmarks.
Demonstrates high inference efficiency on GPU-like devices.
Outperforms CNN-based models in the trade-off between accuracy and speed.
Abstract
Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transformer. We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation, which achieves better trade-off between performance and efficiency than CNN-based models. To achieve high inference efficiency on GPU-like devices, our RTFormer leverages GPU-Friendly Attention with linear complexity and discards the multi-head mechanism. Besides, we find that cross-resolution attention is more efficient to gather global context information for high-resolution branch by spreading the high level knowledge learned from low-resolution branch. Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
