Dynamic Focus-aware Positional Queries for Semantic Segmentation
Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, Dacheng Tao,, Bohan Zhuang

TL;DR
This paper introduces Dynamic Focus-aware Positional Queries (DFPQ), a novel query design for semantic segmentation that dynamically generates positional queries based on cross-attention scores, leading to more accurate localization and state-of-the-art results.
Contribution
The paper proposes DFPQ, a new query design that enhances localization accuracy by dynamically generating positional queries conditioned on cross-attention scores, improving semantic segmentation performance.
Findings
Achieves state-of-the-art performance on ADE20K and Cityscapes datasets.
Outperforms Mask2former by 1.1% to 1.9% mIoU with various backbones.
Effectively handles high-resolution cross-attention with local relation aggregation.
Abstract
The DETR-like segmentors have underpinned the most recent breakthroughs in semantic segmentation, which end-to-end train a set of queries representing the class prototypes or target segments. Recently, masked attention is proposed to restrict each query to only attend to the foreground regions predicted by the preceding decoder block for easier optimization. Although promising, it relies on the learnable parameterized positional queries which tend to encode the dataset statistics, leading to inaccurate localization for distinct individual queries. In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Layer Normalization · Label Smoothing · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections
