Aesthetically Relevant Image Captioning

Zhipeng Zhong; Fei Zhou; Guoping Qiu

arXiv:2211.15378·cs.CV·November 29, 2022

Aesthetically Relevant Image Captioning

Zhipeng Zhong, Fei Zhou, Guoping Qiu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ARIC, a novel image captioning method that incorporates aesthetic relevance scoring to generate more accurate and diverse aesthetic captions, supported by extensive experiments and a large new dataset.

Contribution

The paper proposes the ARS concept and ARIC model, integrating aesthetic relevance into image captioning to improve aesthetic quality and diversity of generated captions.

Findings

01

Higher ARS sentences predict aesthetic ratings more accurately.

02

ARIC generates more accurate and diverse aesthetic captions.

03

Extensive experiments validate the effectiveness of ARIC.

Abstract

Image aesthetic quality assessment (AQA) aims to assign numerical aesthetic ratings to images whilst image aesthetic captioning (IAC) aims to generate textual descriptions of the aesthetic aspects of images. In this paper, we study image AQA and IAC together and present a new IAC method termed Aesthetically Relevant Image Captioning (ARIC). Based on the observation that most textual comments of an image are about objects and their interactions rather than aspects of aesthetics, we first introduce the concept of Aesthetic Relevance Score (ARS) of a sentence and have developed a model to automatically label a sentence with its ARS. We then use the ARS to design the ARIC model which includes an ARS weighted IAC loss function and an ARS based diverse aesthetic caption selector (DACS). We present extensive experimental results to show the soundness of the ARS concept and the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pengzai/aric
pytorchOfficial

Videos

Aesthetically Relevant Image Captioning· underline

Taxonomy

TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques