CoPhIR: a Test Collection for Content-Based Image Retrieval
Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese,, Raffaele Perego, Tommaso Piccioli, Fausto Rabitti

TL;DR
This paper presents CoPhIR, the first large-scale CBIR test collection with 100 million images, enabling scalable similarity search experiments and fostering research collaboration.
Contribution
It introduces CoPhIR, a massive, publicly available image dataset with descriptive features, created through large-scale crawling and feature extraction for CBIR research.
Findings
Successfully built a 100 million image collection.
Demonstrated feasibility of large-scale image crawling and feature extraction.
Enabled new research opportunities in scalable CBIR techniques.
Abstract
The scalability, as well as the effectiveness, of the different Content-based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Given the wealth of images on the Web, CBIR systems must in fact leap towards Web-scale datasets. In this paper, we report on our experience in building a test collection of 100 million images, with the corresponding descriptive features, to be used in experimenting new scalable techniques for similarity searching, and comparing their results. In the context of the SAPIR (Search on Audio-visual content using Peer-to-peer Information Retrieval) European project, we had to experiment our distributed similarity searching technology on a realistic data set. Therefore, since no large-scale collection was available for research purposes, we had to tackle the non-trivial process of image crawling and descriptive feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
