Real-time Semantic Segmentation with Fast Attention

Ping Hu; Federico Perazzi; Fabian Caba Heilbron; Oliver Wang; Zhe Lin,; Kate Saenko; Stan Sclaroff

arXiv:2007.03815·cs.CV·July 13, 2020·6 cites

Real-time Semantic Segmentation with Fast Attention

Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin,, Kate Saenko, Stan Sclaroff

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fast spatial attention mechanism within a novel CNN architecture that significantly improves real-time semantic segmentation accuracy and speed on high-resolution images and videos.

Contribution

The paper proposes a new fast spatial attention module and an efficient architecture that reduces computational costs while maintaining high accuracy for real-time semantic segmentation.

Findings

01

Achieves 74.4% mIoU at 72 FPS on Cityscapes

02

50% faster than previous state-of-the-art methods

03

Maintains high accuracy with minimal loss when processing high-resolution inputs

Abstract

In deep CNN based models for semantic segmentation, high accuracy relies on rich spatial context (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel architecture that addresses both challenges and achieves state-of-the-art performance for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations. Moreover, to efficiently process high-resolution input, we apply an additional spatial reduction to intermediate feature stages of the network with minimal loss in accuracy thanks to the use of the fast attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

feinanshan/FANet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings