Local Memory Attention for Fast Video Semantic Segmentation
Matthieu Paul, Martin Danelljan, Luc Van Gool, Radu Timofte

TL;DR
This paper introduces a fast, general local memory attention module that enhances video semantic segmentation by efficiently integrating past frame information, improving accuracy with minimal inference time increase.
Contribution
A novel local attention-based memory module that can be integrated into existing segmentation models to improve video segmentation performance efficiently.
Findings
Improved mIoU by 1.7% on Cityscapes with ERFNet
Enhanced mIoU by 2.1% on Cityscapes with PSPNet
Increased inference time by only 1.5ms for ERFNet
Abstract
We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline. In contrast to prior works, we strive towards a simple, fast, and general module that can be integrated into virtually any single-frame architecture. Our approach aggregates a rich representation of the semantic information in past frames into a memory module. Information stored in the memory is then accessed through an attention mechanism. In contrast to previous memory-based approaches, we propose a fast local attention layer, providing temporal appearance cues in the local region of prior frames. We further fuse these cues with an encoding of the current frame through a second attention-based module. The segmentation decoder processes the fused representation to predict the final semantic segmentation. We integrate our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsAverage Pooling · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Auxiliary Classifier · Batch Normalization · Pyramid Pooling Module · Dilated Convolution · PSPNet
