Scene Parsing with Integration of Parametric and Non-parametric Models
Bing Shuai, Zhen Zuo, Gang Wang, Bing Wang

TL;DR
This paper introduces a scene parsing method combining CNN-based local feature learning with global scene constraints estimated through a non-parametric approach, achieving state-of-the-art results without post-processing.
Contribution
It proposes an integrated framework that combines parametric CNN models with non-parametric global scene constraints for improved scene parsing accuracy.
Findings
Achieved state-of-the-art results on SiftFlow and Barcelona benchmarks.
Effectively combines local CNN features with global scene context.
No post-processing needed for high-quality label maps.
Abstract
We adopt Convolutional Neural Networks (CNNs) to be our parametric model to learn discriminative features and classifiers for local patch classification. Based on the occurrence frequency distribution of classes, an ensemble of CNNs (CNN-Ensemble) are learned, in which each CNN component focuses on learning different and complementary visual patterns. The local beliefs of pixels are output by CNN-Ensemble. Considering that visually similar pixels are indistinguishable under local context, we leverage the global scene semantics to alleviate the local ambiguity. The global scene constraint is mathematically achieved by adding a global energy term to the labeling energy function, and it is practically estimated in a non-parametric framework. A large margin based CNN metric learning method is also proposed for better global belief estimation. In the end, the integration of local and global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
