Leveraging Pretrained Image Classifiers for Language-Based Segmentation

David Golub; Ahmed El-Kishky; Roberto Mart\'in-Mart\'in

arXiv:1911.00830·cs.CV·March 12, 2020

Leveraging Pretrained Image Classifiers for Language-Based Segmentation

David Golub, Ahmed El-Kishky, Roberto Mart\'in-Mart\'in

PDF

TL;DR

This paper introduces a segmentation method that uses pretrained image classifiers and language semantics to enable zero-shot segmentation of new object classes without retraining.

Contribution

It presents a novel approach that injects visual priors from pretrained classifiers into segmentation models, allowing generalization to unseen classes.

Findings

01

Effective zero-shot segmentation for unseen classes

02

Visual priors improve segmentation accuracy

03

Language semantics enhance prior quality

Abstract

Current semantic segmentation models cannot easily generalize to new object classes unseen during train time: they require additional annotated images and retraining. We propose a novel segmentation model that injects visual priors into semantic segmentation architectures, allowing them to segment out new target labels without retraining. As visual priors, we use the activations of pretrained image classifiers, which provide noisy indications of the spatial location of both the target object and distractor objects in the scene. We leverage language semantics to obtain these activations for a target label unseen by the classifier. Further experiments show that the visual priors obtained via language semantics for both relevant and distracting objects are key to our performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.