ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media   Damage

Daniela Ivanova; Marco Aversa; Paul Henderson; John Williamson

arXiv:2412.04580·cs.CV·December 9, 2024

ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage

Daniela Ivanova, Marco Aversa, Paul Henderson, John Williamson

PDF

Open Access 1 Datasets

TL;DR

ARTeFACT is a comprehensive dataset and benchmark for damage detection in diverse analogue media, highlighting the limitations of current models in generalizing across media types and aiding future research in cultural heritage preservation.

Contribution

We introduce ARTeFACT, a large annotated dataset with textual descriptions for damage detection in various analogue media, and evaluate multiple models revealing their generalization challenges.

Findings

01

Models struggle to generalize damage detection across media types.

02

The dataset enables benchmarking of damage detection methods.

03

Current models have significant limitations in zero-shot and cross-media settings.

Abstract

Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting degradation if the damage operator is known a priori, we show that they fail to robustly predict where the damage is even after supervised training; thus, reliable damage detection remains a challenge. Motivated by this, we introduce ARTeFACT, a dataset for damage detection in diverse types analogue media, with over 11,000 annotations covering 15 kinds of damage across various subjects, media, and historical provenance. Furthermore, we contribute human-verified text prompts describing the semantic contents of the images, and derive additional textual descriptions of the annotated damage. We evaluate CNN, Transformer, diffusion-based segmentation models, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

danielaivanova/damaged-media
dataset· 232 dl
232 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning

MethodsAttention Is All You Need · Adam · Position-Wise Feed-Forward Layer · Linear Layer · Softmax · Multi-Head Attention · Byte Pair Encoding · Label Smoothing · Dropout · Dense Connections