Seeing the Undefined: Chain-of-Action for Generative Semantic Labels
Meng Wei, Zhongnian Li, Peng Ying, Xinzheng Xu

TL;DR
This paper introduces Generative Semantic Labels (GSLs), a new task for generating comprehensive, undefined semantic labels for images, and proposes Chain-of-Action (CoA), a method that improves label generation by sequentially enriching contextual information.
Contribution
The paper presents GSLs as a novel task and introduces CoA, a new method that decomposes label generation into sequential actions to enhance contextual understanding and accuracy.
Findings
CoA significantly improves semantic label accuracy.
GSLs enables richer image content representation.
Method outperforms existing approaches on benchmark datasets.
Abstract
Recent advances in vision-language models (VLMs) have demonstrated remarkable capabilities in image classification by leveraging predefined sets of labels to construct text prompts for zero-shot reasoning. However, these approaches face significant limitations in undefined domains, where the label space is vocabulary-unknown and composite. We thus introduce Generative Semantic Labels (GSLs), a novel task that aims to predict a comprehensive set of semantic labels for an image without being constrained by a predefined labels set. Unlike traditional zero-shot classification, GSLs generates multiple semantic-level labels, encompassing objects, scenes, attributes, and relationships, thereby providing a richer and more accurate representation of image content. In this paper, we propose Chain-of-Action (CoA), an innovative method designed to tackle the GSLs task. CoA is motivated by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsSparse Evolutionary Training
