Speaking images. A novel framework for the automated self-description of artworks

Valentine Bernasconi; Gustavo Marfia

arXiv:2506.05368·cs.CV·June 9, 2025

Speaking images. A novel framework for the automated self-description of artworks

Valentine Bernasconi, Gustavo Marfia

PDF

Open Access

TL;DR

This paper introduces a framework that automatically creates explanatory videos of digitized artworks using AI models, enhancing accessibility and interpretation of cultural artifacts.

Contribution

It presents a novel open-source AI-based system for generating self-explaining artwork videos, combining face detection, text-to-speech, and animation models.

Findings

01

Automated video generation from digital artworks.

02

Addresses cultural biases in AI models.

03

Explores educational and artistic applications.

Abstract

Recent breakthroughs in generative AI have opened the door to new research perspectives in the domain of art and cultural heritage, where a large number of artifacts have been digitized. There is a need for innovation to ease the access and highlight the content of digital collections. Such innovations develop into creative explorations of the digital image in relation to its malleability and contemporary interpretation, in confrontation to the original historical object. Based on the concept of the autonomous image, we propose a new framework towards the production of self-explaining cultural artifacts using open-source large-language, face detection, text-to-speech and audio-to-animation models. The goal is to start from a digitized artwork and to automatically assemble a short video of the latter where the main character animates to explain its content. The whole process questions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis · Multimodal Machine Learning Applications