Specialize and Fuse: Pyramidal Output Representation for Semantic Segmentation
Chi-Wei Hsiao, Cheng Sun, Hwann-Tzong Chen, Min Sun

TL;DR
This paper introduces a pyramidal output representation with a specialize and fuse process for semantic segmentation, improving efficiency and achieving state-of-the-art results on multiple datasets.
Contribution
It proposes a novel pyramidal output structure and a fusion method that enhances semantic segmentation accuracy and efficiency.
Findings
Achieves state-of-the-art performance on ADE20K, COCO-Stuff, and Pascal-Context datasets.
Demonstrates the effectiveness of the pyramidal output and fusion modules through ablation studies.
Reduces prediction complexity by focusing on unity-cells with shared labels.
Abstract
We present a novel pyramidal output representation to ensure parsimony with our "specialize and fuse" process for semantic segmentation. A pyramidal "output" representation consists of coarse-to-fine levels, where each level is "specialize" in a different class distribution (e.g., more stuff than things classes at coarser levels). Two types of pyramidal outputs (i.e., unity and semantic pyramid) are "fused" into the final semantic output, where the unity pyramid indicates unity-cells (i.e., all pixels in such cell share the same semantic label). The process ensures parsimony by predicting a relatively small number of labels for unity-cells (e.g., a large cell of grass) to build the final semantic output. In addition to the "output" representation, we design a coarse-to-fine contextual module to aggregate the "features" representation from different levels. We validate the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
