Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu

TL;DR
This paper introduces Abstract Scene Graphs (ASGs) for fine-grained control over image captioning, enabling diverse and intention-aware descriptions by recognizing user-specified graph structures.
Contribution
The paper proposes ASGs as a novel representation for user intentions and introduces the ASG2Caption model to generate controlled, diverse image captions based on these graphs.
Findings
Better controllability of captions using ASGs.
Significant improvement in caption diversity.
Enhanced performance on VisualGenome and MSCOCO datasets.
Abstract
Humans are able to describe image contents with coarse to fine details as they wish. However, most image captioning models are intention-agnostic which can not generate diverse descriptions according to different user intentions initiatively. In this work, we propose the Abstract Scene Graph (ASG) structure to represent user intention in fine-grained level and control what and how detailed the generated description should be. The ASG is a directed graph consisting of three types of \textbf{abstract nodes} (object, attribute, relationship) grounded in the image without any concrete semantic labels. Thus it is easy to obtain either manually or automatically. From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions according to the graph structure. Our model achieves better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
