Feature Pyramid Encoding Network for Real-time Semantic Segmentation

Mengyu Liu; Hujun Yin

arXiv:1909.08599·cs.CV·September 19, 2019·19 cites

Feature Pyramid Encoding Network for Real-time Semantic Segmentation

Mengyu Liu, Hujun Yin

PDF

Open Access 2 Repos

TL;DR

This paper introduces FPENet, a lightweight and efficient neural network architecture for real-time semantic segmentation that balances accuracy with speed and memory usage.

Contribution

The paper proposes a novel feature pyramid encoding network with a mutual embedding upsample module, achieving high accuracy with fewer parameters for real-time segmentation.

Findings

01

Achieves 68.0% mean IoU on Cityscapes with 0.4M parameters

02

Runs at 102 FPS on NVIDIA TITAN V GPU

03

Outperforms existing real-time segmentation methods

Abstract

Although current deep learning methods have achieved impressive results for semantic segmentation, they incur high computational costs and have a huge number of parameters. For real-time applications, inference speed and memory usage are two important factors. To address the challenge, we propose a lightweight feature pyramid encoding network (FPENet) to make a good trade-off between accuracy and speed. Specifically, we use a feature pyramid encoding block to encode multi-scale contextual features with depthwise dilated convolutions in all stages of the encoder. A mutual embedding upsample module is introduced in the decoder to aggregate the high-level semantic features and low-level spatial details efficiently. The proposed network outperforms existing real-time methods with fewer parameters and improved inference speed on the Cityscapes and CamVid benchmark datasets. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings