KALE: An Artwork Image Captioning System Augmented with Heterogeneous   Graph

Yanbei Jiang; Krista A. Ehinger; Jey Han Lau

arXiv:2409.10921·cs.CV·September 18, 2024

KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph

Yanbei Jiang, Krista A. Ehinger, Jey Han Lau

PDF

Open Access 1 Repo

TL;DR

KALE is a novel artwork image captioning system that enhances caption quality by integrating artwork metadata through a heterogeneous knowledge graph and a cross-modal alignment loss, outperforming existing models.

Contribution

This work introduces KALE, a new model that combines metadata and knowledge graphs with vision-language techniques for improved artwork captioning.

Findings

01

KALE achieves superior CIDEr scores compared to state-of-the-art methods.

02

Incorporating metadata via a knowledge graph enhances caption accuracy.

03

The cross-modal alignment loss improves the correlation between images and metadata.

Abstract

Exploring the narratives conveyed by fine-art paintings is a challenge in image captioning, where the goal is to generate descriptions that not only precisely represent the visual content but also offer a in-depth interpretation of the artwork's meaning. The task is particularly complex for artwork images due to their diverse interpretations and varied aesthetic principles across different artistic schools and styles. In response to this, we present KALE Knowledge-Augmented vision-Language model for artwork Elaborations), a novel approach that enhances existing vision-language models by integrating artwork metadata as additional knowledge. KALE incorporates the metadata in two ways: firstly as direct textual input, and secondly through a multimodal heterogeneous knowledge graph. To optimize the learning of graph representations, we introduce a new cross-modal alignment loss that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanbei-jiang/artwork-interpretation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition