TL;DR
A3-FPN introduces an asymptotic content-aware pyramid attention network that enhances multi-scale feature representation for dense visual prediction, improving performance on benchmarks like MS COCO and Cityscapes.
Contribution
It proposes a novel asymptotically disentangled framework with content-aware attention modules to better capture discriminative features and small objects.
Findings
Achieves 49.6 mask AP on MS COCO with Swin-L backbone.
Improves Cityscapes mIoU to 85.6 with A3-FPN.
Demonstrates compatibility with CNN and Transformer architectures.
Abstract
Learning multi-scale representations is the common strategy to tackle object scale variation in dense prediction tasks. Although existing feature pyramid networks have greatly advanced visual recognition, inherent design defects inhibit them from capturing discriminative features and recognizing small objects. In this work, we propose Asymptotic Content-Aware Pyramid Attention Network (A3-FPN), to augment multi-scale feature representation via the asymptotically disentangled framework and content-aware attention modules. Specifically, A3-FPN employs a horizontally-spread column network that enables asymptotically global feature interaction and disentangles each level from all hierarchical representations. In feature fusion, it collects supplementary content from the adjacent level to generate position-wise offsets and weights for context-aware resampling, and learns deep context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
