DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation
Hanchao Li, Pengfei Xiong, Haoqiang Fan, Jian Sun

TL;DR
DFANet is a highly efficient CNN architecture designed for real-time semantic segmentation, balancing speed and accuracy by reducing parameters and FLOPs while maintaining strong performance on benchmark datasets.
Contribution
The paper introduces DFANet, a lightweight CNN with multi-scale feature aggregation that achieves state-of-the-art real-time segmentation performance with significantly fewer computational resources.
Findings
Achieves 70.3% Mean IOU on Cityscapes with 1.7 GFLOPs
Runs at 160 FPS on a Titan X GPU
Uses 8× fewer FLOPs than previous methods
Abstract
This paper introduces an extremely efficient CNN architecture named DFANet for semantic segmentation under resource constraints. Our proposed network starts from a single lightweight backbone and aggregates discriminative features through sub-network and sub-stage cascade respectively. Based on the multi-scale feature propagation, DFANet substantially reduces the number of parameters, but still obtains sufficient receptive field and enhances the model learning ability, which strikes a balance between the speed and segmentation performance. Experiments on Cityscapes and CamVid datasets demonstrate the superior performance of DFANet with 8 less FLOPs and 2 faster than the existing state-of-the-art real-time semantic segmentation methods while providing comparable accuracy. Specifically, it achieves 70.3\% Mean IOU on the Cityscapes test dataset with only 1.7 GFLOPs and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
