Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

V. Sevetlidis; V. Arampatzakis; M. Karta; I. Mourthos; D. Tsiafaki; G. Pavlidis

arXiv:2604.04071·cs.CV·April 7, 2026

Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

V. Sevetlidis, V. Arampatzakis, M. Karta, I. Mourthos, D. Tsiafaki, G. Pavlidis

PDF

TL;DR

This paper introduces a positive-unlabeled learning method for detecting duplicate media in cultural repositories, improving accuracy and interpretability over existing approaches.

Contribution

It formulates curator-in-the-loop duplicate detection as a PU learning problem and demonstrates significant performance improvements on CIFAR-10 and AtticPOT datasets.

Findings

01

Achieved F1=96.37 on CIFAR-10 and F1=90.79 on AtticPOT

02

Improved F1 by +7.70 points over the SVDD baseline

03

Provides an interpretable threshold and avoids explicit negatives

Abstract

We formulate curator-in-the-loop duplicate discovery in the AtticPOT repository as a Positive-Unlabeled (PU) learning problem. Given a single anchor per artefact, we train a lightweight per-query Clone Encoder on augmented views of the anchor and score the unlabeled repository with an interpretable threshold on the latent l_2 norm. The system proposes candidates for curator verification, uncovering cross-record duplicates that were not verified a priori. On CIFAR-10 we obtain F1=96.37 (AUROC=97.97); on AtticPOT we reach F1=90.79 (AUROC=98.99), improving F1 by +7.70 points over the best baseline (SVDD) under the same lightweight backbone. Qualitative "find-similar" panels show stable neighbourhoods across viewpoint and condition. The method avoids explicit negatives, offers a transparent operating point, and fits de-duplication, record linkage, and curator-in-the-loop workflows.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.