Leveraging Pretrained Image Classifiers for Language-Based Segmentation
David Golub, Ahmed El-Kishky, Roberto Mart\'in-Mart\'in

TL;DR
This paper introduces a segmentation method that uses pretrained image classifiers and language semantics to enable zero-shot segmentation of new object classes without retraining.
Contribution
It presents a novel approach that injects visual priors from pretrained classifiers into segmentation models, allowing generalization to unseen classes.
Findings
Effective zero-shot segmentation for unseen classes
Visual priors improve segmentation accuracy
Language semantics enhance prior quality
Abstract
Current semantic segmentation models cannot easily generalize to new object classes unseen during train time: they require additional annotated images and retraining. We propose a novel segmentation model that injects visual priors into semantic segmentation architectures, allowing them to segment out new target labels without retraining. As visual priors, we use the activations of pretrained image classifiers, which provide noisy indications of the spatial location of both the target object and distractor objects in the scene. We leverage language semantics to obtain these activations for a target label unseen by the classifier. Further experiments show that the visual priors obtained via language semantics for both relevant and distracting objects are key to our performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
