Framework-agnostic Semantically-aware Global Reasoning for Segmentation
Mir Rayat Imtiaz Hossain, Leonid Sigal, James J. Little

TL;DR
This paper introduces a flexible, scene-semantic global reasoning module that enhances segmentation by learning to project features into interpretable latent regions and reasoning over them with transformers, improving performance across various models.
Contribution
It proposes a novel semantic global reasoning component that can be integrated into different segmentation architectures, enabling scene-aware reasoning and improved results.
Findings
Improved segmentation accuracy across multiple datasets.
Latent tokens are semantically interpretable and diverse.
Enhanced downstream task performance, such as object detection.
Abstract
Recent advances in pixel-level tasks (e.g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features. However, such aggregated representations, often in the form of attention, fail to model the underlying semantics of the scene (e.g. individual objects and, by extension, their interactions). In this work, we address the issue by proposing a component that learns to project image features into latent representations and reason between them using a transformer encoder to generate contextualized and scene-consistent representations which are fused with original image features. Our design encourages the latent regions to represent semantic concepts by ensuring that the activated regions are spatially disjoint and the union of such regions corresponds to a connected object segment. The proposed semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Framework-Agnostic Semantically-Aware Global Reasoning for Segmentation· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsSpatial Pyramid Pooling · Batch Normalization · Dilated Convolution · Atrous Spatial Pyramid Pooling · 1x1 Convolution · DeepLabv3
