EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections
Francesc Net, Lluis Gomez

TL;DR
This paper introduces EUFCC-CIR, a large dataset for Composed Image Retrieval in cultural heritage collections, enabling better AI-driven exploration of GLAM archives.
Contribution
It provides a novel, extensive CIR dataset tailored for Digital Humanities, filling a gap in resources for cultural heritage image retrieval.
Findings
EUFCC-CIR contains over 180K annotated triplets.
The dataset demonstrates unique qualities compared to existing CIR datasets.
Zero-shot CIR baselines show promising performance on EUFCC-CIR.
Abstract
The intersection of Artificial Intelligence and Digital Humanities enables researchers to explore cultural heritage collections with greater depth and scale. In this paper, we present EUFCC-CIR, a dataset designed for Composed Image Retrieval (CIR) within Galleries, Libraries, Archives, and Museums (GLAM) collections. Our dataset is built on top of the EUFCC-340K image labeling dataset and contains over 180K annotated CIR triplets. Each triplet is composed of a multi-modal query (an input image plus a short text describing the desired attribute manipulations) and a set of relevant target images. The EUFCC-CIR dataset fills an existing gap in CIR-specific resources for Digital Humanities. We demonstrate the value of the EUFCC-CIR dataset by highlighting its unique qualities in comparison to other existing CIR datasets and evaluating the performance of several zero-shot CIR baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Colorectal Cancer Screening and Detection
MethodsSparse Evolutionary Training
