Aesthetically Relevant Image Captioning
Zhipeng Zhong, Fei Zhou, Guoping Qiu

TL;DR
This paper introduces ARIC, a novel image captioning method that incorporates aesthetic relevance scoring to generate more accurate and diverse aesthetic captions, supported by extensive experiments and a large new dataset.
Contribution
The paper proposes the ARS concept and ARIC model, integrating aesthetic relevance into image captioning to improve aesthetic quality and diversity of generated captions.
Findings
Higher ARS sentences predict aesthetic ratings more accurately.
ARIC generates more accurate and diverse aesthetic captions.
Extensive experiments validate the effectiveness of ARIC.
Abstract
Image aesthetic quality assessment (AQA) aims to assign numerical aesthetic ratings to images whilst image aesthetic captioning (IAC) aims to generate textual descriptions of the aesthetic aspects of images. In this paper, we study image AQA and IAC together and present a new IAC method termed Aesthetically Relevant Image Captioning (ARIC). Based on the observation that most textual comments of an image are about objects and their interactions rather than aspects of aesthetics, we first introduce the concept of Aesthetic Relevance Score (ARS) of a sentence and have developed a model to automatically label a sentence with its ARS. We then use the ARS to design the ARIC model which includes an ARS weighted IAC loss function and an ARS based diverse aesthetic caption selector (DACS). We present extensive experimental results to show the soundness of the ARS concept and the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
