EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in   GLAM Collections

Francesc Net; Marc Folia; Pep Casals; Andrew D. Bagdanov; Lluis Gomez

arXiv:2406.02380·cs.CV·June 5, 2024

EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

Francesc Net, Marc Folia, Pep Casals, Andrew D. Bagdanov, Lluis Gomez

PDF

Open Access 1 Repo

TL;DR

This paper introduces EUFCC340K, a large hierarchical dataset for metadata annotation in GLAM collections, and develops baseline models to improve multi-label image tagging and cataloging in cultural heritage.

Contribution

The paper presents a new large-scale, hierarchically organized dataset and baseline models for multi-label metadata annotation in GLAM collections, addressing a key challenge in cultural heritage cataloging.

Findings

01

Baseline models show promising accuracy in multi-label classification.

02

The dataset improves robustness and generalization in metadata annotation tasks.

03

Models outperform existing methods on the new dataset.

Abstract

In this paper, we address the challenges of automatic metadata annotation in the domain of Galleries, Libraries, Archives, and Museums (GLAMs) by introducing a novel dataset, EUFCC340K, collected from the Europeana portal. Comprising over 340,000 images, the EUFCC340K dataset is organized across multiple facets: Materials, Object Types, Disciplines, and Subjects, following a hierarchical structure based on the Art & Architecture Thesaurus (AAT). We developed several baseline models, incorporating multiple heads on a ConvNeXT backbone for multi-label image tagging on these facets, and fine-tuning a CLIP model with our image text pairs. Our experiments to evaluate model robustness and generalization capabilities in two different test scenarios demonstrate the utility of the dataset in improving multi-label classification tools that have the potential to alleviate cataloging tasks in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cesc47/EUFCC-340K
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Semantic Web and Ontologies · Handwritten Text Recognition Techniques

MethodsConvNeXt · Contrastive Language-Image Pre-training