CaseNet: Content-Adaptive Scale Interaction Networks for Scene Parsing
Xin Jin, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zhibo Chen

TL;DR
This paper introduces CASINet, a novel scene parsing network that adaptively interacts across multiple scales and positions, significantly improving pixel-level semantic prediction accuracy.
Contribution
We propose a Content-Adaptive Scale Interaction Network with modules for explicit scale interaction and adaptive scale selection, enhancing multi-scale feature utilization for scene parsing.
Findings
Achieves state-of-the-art results on Cityscapes, ADE20K, and LIP datasets.
Demonstrates the effectiveness of the CSI and SA modules through ablation studies.
Outperforms existing multi-scale methods in scene parsing accuracy.
Abstract
Objects at different spatial positions in an image exhibit different scales. Adaptive receptive fields are expected to capture suitable ranges of context for accurate pixel level semantic prediction. Recently, atrous convolution with different dilation rates has been used to generate features of multi-scales through several branches which are then fused for prediction. However, there is a lack of explicit interaction among the branches of different scales to adaptively make full use of the contexts. In this paper, we propose a Content-Adaptive Scale Interaction Network (CASINet) to exploit the multi-scale features for scene parsing. We build CASINet based on the classic Atrous Spatial Pyramid Pooling (ASPP) module, followed by a proposed contextual scale interaction (CSI) module, and a scale adaptation (SA) module. Specifically, in the CSI module, for each spatial position of some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsSpatial Pyramid Pooling · Dilated Convolution · Convolution
