Diffusion Based Augmentation for Captioning and Retrieval in Cultural   Heritage

Dario Cioni; Lorenzo Berlincioni; Federico Becattini; Alberto del; Bimbo

arXiv:2308.07151·cs.CV·August 20, 2024

Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage

Dario Cioni, Lorenzo Berlincioni, Federico Becattini, Alberto del, Bimbo

PDF

Open Access 1 Repo

TL;DR

This paper proposes a diffusion-based data augmentation method that generates diverse artwork variations conditioned on captions, improving model training for cultural heritage applications despite limited data and domain shifts.

Contribution

It introduces a novel generative vision-language augmentation approach specifically designed for cultural heritage datasets, addressing data scarcity and domain shift issues.

Findings

01

Enhanced dataset diversity improves model performance.

02

Generated variations lead to better captioning accuracy.

03

Bridging domain gaps enhances visual and linguistic understanding.

Abstract

Cultural heritage applications and advanced machine learning models are creating a fruitful synergy to provide effective and accessible ways of interacting with artworks. Smart audio-guides, personalized art-related content and gamification approaches are just a few examples of how technology can be exploited to provide additional value to artists or exhibitions. Nonetheless, from a machine learning point of view, the amount of available artistic data is often not enough to train effective models. Off-the-shelf computer vision modules can still be exploited to some extent, yet a severe domain shift is present between art images and standard natural image datasets used to train such models. As a result, this can lead to degraded performance. This paper introduces a novel approach to address the challenges of limited annotated data and domain shifts in the cultural heritage domain. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ciodar/cultural-heritage-diffaug
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition