Feature Pyramid Encoding Network for Real-time Semantic Segmentation
Mengyu Liu, Hujun Yin

TL;DR
This paper introduces FPENet, a lightweight and efficient neural network architecture for real-time semantic segmentation that balances accuracy with speed and memory usage.
Contribution
The paper proposes a novel feature pyramid encoding network with a mutual embedding upsample module, achieving high accuracy with fewer parameters for real-time segmentation.
Findings
Achieves 68.0% mean IoU on Cityscapes with 0.4M parameters
Runs at 102 FPS on NVIDIA TITAN V GPU
Outperforms existing real-time segmentation methods
Abstract
Although current deep learning methods have achieved impressive results for semantic segmentation, they incur high computational costs and have a huge number of parameters. For real-time applications, inference speed and memory usage are two important factors. To address the challenge, we propose a lightweight feature pyramid encoding network (FPENet) to make a good trade-off between accuracy and speed. Specifically, we use a feature pyramid encoding block to encode multi-scale contextual features with depthwise dilated convolutions in all stages of the encoder. A mutual embedding upsample module is introduced in the decoder to aggregate the high-level semantic features and low-level spatial details efficiently. The proposed network outperforms existing real-time methods with fewer parameters and improved inference speed on the Cityscapes and CamVid benchmark datasets. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
