Expectation-Maximization Attention Networks for Semantic Segmentation
Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu

TL;DR
This paper introduces Expectation-Maximization Attention (EMA), a novel, efficient attention mechanism for semantic segmentation that captures long-range relations with reduced computation and noise, achieving state-of-the-art results.
Contribution
It formulates attention as an EM process to create a compact, low-rank representation, improving efficiency and robustness in semantic segmentation tasks.
Findings
Achieved new records on PASCAL VOC, PASCAL Context, and COCO Stuff benchmarks.
EMA reduces computational cost compared to traditional self-attention.
Provides a stable training procedure with bases maintenance and normalization.
Abstract
Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
