Iconographic Image Captioning for Artworks

Eva Cetinic

arXiv:2102.03942·cs.CV·February 9, 2021

Iconographic Image Captioning for Artworks

Eva Cetinic

PDF

1 Repo

TL;DR

This paper introduces a transformer-based model trained on a large-scale art dataset with Iconclass annotations to generate meaningful captions for artworks, addressing unique challenges in art image captioning.

Contribution

It presents a novel dataset and fine-tunes a vision-language model specifically for art images, improving caption relevance in art historical context.

Findings

01

Generated captions are more relevant to art context than natural image models

02

The model generalizes well to new artwork collections

03

Captions exhibit strong relevance to artistic genres

Abstract

Image captioning implies automatically generating textual descriptions of images based only on the visual input. Although this has been an extensively addressed research topic in recent years, not many contributions have been made in the domain of art historical data. In this particular context, the task of image captioning is confronted with various challenges such as the lack of large-scale datasets of image-text pairs, the complexity of meaning associated with describing artworks and the need for expert-level annotations. This work aims to address some of those challenges by utilizing a novel large-scale dataset of artwork images annotated with concepts from the Iconclass classification system designed for art and iconography. The annotations are processed into clean textual description to create a dataset suitable for training a deep neural network model on the image captioning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Surojit-KB/ARTY
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.