Learning Rich Representations For Structured Visual Prediction Tasks

Mohammadreza Mostajabi

arXiv:1908.11820·cs.CV·September 2, 2019

Learning Rich Representations For Structured Visual Prediction Tasks

Mohammadreza Mostajabi

PDF

Open Access

TL;DR

This paper introduces a zoom-out feature approach for learning rich image representations that improve structured visual prediction tasks like segmentation, avoiding complex inference and enabling weakly supervised learning.

Contribution

It proposes a novel zoom-out feature method that captures multi-scale information, enhancing prediction accuracy and enabling weakly supervised semantic segmentation from image-level tags.

Findings

01

Achieves competitive segmentation accuracy with modern neural architectures.

02

Enables category-level segmentation using only image-level labels.

03

Introduces data-driven regularization via autoencoder-based label space modeling.

Abstract

We describe an approach to learning rich representations for images, that enables simple and effective predictors in a range of vision tasks involving spatially structured maps. Our key idea is to map small image elements to feature representations extracted from a sequence of nested regions of increasing spatial extent. These regions are obtained by "zooming out" from the pixel/superpixel all the way to scene-level resolution, and hence we call these zoom-out features. Applied to semantic segmentation and other structured prediction tasks, our approach exploits statistical structure in the image and in the label space without setting up explicit structured prediction mechanisms, and thus avoids complex and expensive inference. Instead image elements are classified by a feedforward multilayer network with skip-layer connections spanning the zoom-out levels. When used in conjunction with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Bottleneck Residual Block · Residual Connection · Convolution · Residual Block · Average Pooling · Concatenated Skip Connection · Bitcoin Customer Service Number +1-833-534-1729 · Global Average Pooling