TL;DR
FlatteNet introduces a lightweight, versatile framework for dense pixelwise prediction that maintains high resolution without complex decoders, improving performance across various vision tasks.
Contribution
The paper proposes the Flattening Module, a simple, effective component that enhances dense pixelwise prediction without complex architectures, compatible with existing FCNs.
Findings
Achieves competitive results in human pose estimation on MPII.
Performs well in semantic segmentation on PASCAL-Context.
Effective in object detection on PASCAL VOC.
Abstract
In this paper, we focus on devising a versatile framework for dense pixelwise prediction whose goal is to assign a discrete or continuous label to each pixel for an image. It is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce the Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. In addition, the Flattening Module is lightweight and can be easily combined with any existing FCNs, allowing the model builder to trade off among model size, computational cost and accuracy by simply choosing different backbone networks. We empirically demonstrate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDilated Convolution · Convolution
