MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic Segmentation
Zhenchao Jin, Dongdong Yu, Zehuan Yuan, Lequan Yu

TL;DR
MCIBI++ introduces a novel approach for semantic segmentation that leverages dataset-level category representations beyond individual images, significantly enhancing pixel-level accuracy and achieving state-of-the-art results.
Contribution
The paper proposes MCIBI++, a new framework that incorporates dataset-level contextual information through a dynamic memory module and an iterative inference strategy, improving segmentation performance.
Findings
Achieved state-of-the-art results on seven benchmarks.
Enhanced segmentation accuracy with dataset-level context aggregation.
Effective extension to video semantic segmentation.
Abstract
Co-occurrent visual pattern makes context aggregation become an essential paradigm for semantic segmentation.The existing studies focus on modeling the contexts within image while neglecting the valuable semantics of the corresponding category beyond image. To this end, we propose a novel soft mining contextual information beyond image paradigm named MCIBI++ to further boost the pixel-level representations. Specifically, we first set up a dynamically updated memory module to store the dataset-level distribution information of various categories and then leverage the information to yield the dataset-level category representations during network forward. After that, we generate a class probability distribution for each pixel representation and conduct the dataset-level context aggregation with the class probability distribution as weights. Finally, the original pixel representations are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
