Specialize and Fuse: Pyramidal Output Representation for Semantic   Segmentation

Chi-Wei Hsiao; Cheng Sun; Hwann-Tzong Chen; Min Sun

arXiv:2108.01866·cs.CV·August 20, 2021

Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation

Chi-Wei Hsiao, Cheng Sun, Hwann-Tzong Chen, Min Sun

PDF

Open Access

TL;DR

This paper introduces a pyramidal output representation with a specialize and fuse process for semantic segmentation, improving efficiency and achieving state-of-the-art results on multiple datasets.

Contribution

It proposes a novel pyramidal output structure and a fusion method that enhances semantic segmentation accuracy and efficiency.

Findings

01

Achieves state-of-the-art performance on ADE20K, COCO-Stuff, and Pascal-Context datasets.

02

Demonstrates the effectiveness of the pyramidal output and fusion modules through ablation studies.

03

Reduces prediction complexity by focusing on unity-cells with shared labels.

Abstract

We present a novel pyramidal output representation to ensure parsimony with our "specialize and fuse" process for semantic segmentation. A pyramidal "output" representation consists of coarse-to-fine levels, where each level is "specialize" in a different class distribution (e.g., more stuff than things classes at coarser levels). Two types of pyramidal outputs (i.e., unity and semantic pyramid) are "fused" into the final semantic output, where the unity pyramid indicates unity-cells (i.e., all pixels in such cell share the same semantic label). The process ensures parsimony by predicting a relatively small number of labels for unity-cells (e.g., a large cell of grass) to build the final semantic output. In addition to the "output" representation, we design a coarse-to-fine contextual module to aggregate the "features" representation from different levels. We validate the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning