RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang, Hao Zhang

TL;DR
RESAnything introduces a zero-shot, open-vocabulary approach for referring expression segmentation that leverages attribute prompting and large language models to handle complex, implicit, and part-level queries without training on specific annotations.
Contribution
It is the first zero-shot RES method utilizing LLMs and attribute prompting, enabling deep reasoning about object attributes for arbitrary referring expressions.
Findings
Outperforms existing zero-shot RES methods on standard benchmarks.
Significantly better on complex implicit and part-level queries.
Introduces a new RES benchmark dataset with 3K instances.
Abstract
We present an open-vocabulary and zero-shot method for arbitrary referring expression segmentation (RES), targeting input expressions that are more general than what prior works were designed to handle. Specifically, our inputs encompass both object- and part-level labels as well as implicit references pointing to properties or qualities of object/part function, design, style, material, etc. Our model, coined RESAnything, leverages Chain-of-Thoughts (CoT) reasoning, where the key idea is attribute prompting. We generate detailed descriptions of object/part attributes including shape, color, and location for potential segment proposals through systematic prompting of a large language model (LLM), where the proposals are produced by a foundational image segmentation model. Our approach encourages deep reasoning about object or part attributes related to function, style, design, etc.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
