In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic   Segmentation

Dahyun Kang; Minsu Cho

arXiv:2408.04961·cs.CV·August 12, 2024

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Dahyun Kang, Minsu Cho

PDF

1 Repo

TL;DR

This paper introduces lazy visual grounding, a two-stage, unsupervised approach for open-vocabulary semantic segmentation that effectively localizes objects without additional training, outperforming pixel-to-text classification methods.

Contribution

The paper proposes a novel two-stage method that discovers object masks with Normalized cuts and assigns text later, eliminating the need for extra training and improving segmentation accuracy.

Findings

01

Achieves strong performance on five public datasets.

02

Produces precise and visually appealing segmentation results.

03

Requires no additional training data.

Abstract

We present lazy visual grounding, a two-stage approach of unsupervised object mask discovery followed by object grounding, for open-vocabulary semantic segmentation. Plenty of the previous art casts this task as pixel-to-text classification without object-level comprehension, leveraging the image-to-text classification capability of pretrained vision-and-language models. We argue that visual objects are distinguishable without the prior text information as segmentation is essentially a vision task. Lazy visual grounding first discovers object masks covering an image with iterative Normalized cuts and then later assigns text on the discovered objects in a late interaction manner. Our model requires no additional training yet shows great performance on five public datasets: Pascal VOC, Pascal Context, COCO-object, COCO-stuff, and ADE 20K. Especially, the visually appealing segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dahyun-kang/lazygrounding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.