Expectation-Maximization Attention Networks for Semantic Segmentation

Xia Li; Zhisheng Zhong; Jianlong Wu; Yibo Yang; Zhouchen Lin; Hong Liu

arXiv:1907.13426·cs.CV·May 28, 2024·113 cites

Expectation-Maximization Attention Networks for Semantic Segmentation

Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu

PDF

Open Access 5 Repos

TL;DR

This paper introduces Expectation-Maximization Attention (EMA), a novel, efficient attention mechanism for semantic segmentation that captures long-range relations with reduced computation and noise, achieving state-of-the-art results.

Contribution

It formulates attention as an EM process to create a compact, low-rank representation, improving efficiency and robustness in semantic segmentation tasks.

Findings

01

Achieved new records on PASCAL VOC, PASCAL Context, and COCO Stuff benchmarks.

02

EMA reduces computational cost compared to traditional self-attention.

03

Provides a stable training procedure with bases maintenance and normalization.

Abstract

Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning