Local Memory Attention for Fast Video Semantic Segmentation

Matthieu Paul; Martin Danelljan; Luc Van Gool; Radu Timofte

arXiv:2101.01715·cs.CV·September 28, 2021

Local Memory Attention for Fast Video Semantic Segmentation

Matthieu Paul, Martin Danelljan, Luc Van Gool, Radu Timofte

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fast, general local memory attention module that enhances video semantic segmentation by efficiently integrating past frame information, improving accuracy with minimal inference time increase.

Contribution

A novel local attention-based memory module that can be integrated into existing segmentation models to improve video segmentation performance efficiently.

Findings

01

Improved mIoU by 1.7% on Cityscapes with ERFNet

02

Enhanced mIoU by 2.1% on Cityscapes with PSPNet

03

Increased inference time by only 1.5ms for ERFNet

Abstract

We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline. In contrast to prior works, we strive towards a simple, fast, and general module that can be integrated into virtually any single-frame architecture. Our approach aggregates a rich representation of the semantic information in past frames into a memory module. Information stored in the memory is then accessed through an attention mechanism. In contrast to previous memory-based approaches, we propose a fast local attention layer, providing temporal appearance cues in the local region of prior frames. We further fuse these cues with an encoding of the current frame through a second attention-based module. The segmentation decoder processes the fused representation to predict the final semantic segmentation. We integrate our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mattpfr/lmanet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsAverage Pooling · Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Auxiliary Classifier · Batch Normalization · Pyramid Pooling Module · Dilated Convolution · PSPNet