CCNet: Criss-Cross Attention for Semantic Segmentation

Zilong Huang; Xinggang Wang; Yunchao Wei; Lichao Huang; Humphrey Shi,; Wenyu Liu; Thomas S. Huang

arXiv:1811.11721·cs.CV·July 10, 2020·349 cites

CCNet: Criss-Cross Attention for Semantic Segmentation

Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi,, Wenyu Liu, Thomas S. Huang

PDF

Open Access 4 Repos 1 Models

TL;DR

CCNet introduces a novel criss-cross attention module that efficiently captures full-image contextual information for semantic segmentation, achieving state-of-the-art results with reduced memory and computational costs.

Contribution

The paper proposes a recurrent criss-cross attention module that is GPU memory friendly, computationally efficient, and effective for full-image context modeling in semantic segmentation.

Findings

01

Achieves state-of-the-art mIoU scores on Cityscapes, ADE20K, and LIP benchmarks.

02

Requires 11x less GPU memory than non-local blocks.

03

Reduces FLOPs by approximately 85% compared to non-local attention.

Abstract

Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
mccaly/test2
model· 12 dl· ♡ 1
12 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsCriss-Cross Network · Residual Connection · Non-Local Operation · 1x1 Convolution · Non-Local Block