Evaluating the Impact of Data Anonymization on Image Retrieval
Marvin Chen, Manuel Eberhardinger, Johannes Maucher

TL;DR
This paper systematically evaluates how different data anonymization techniques impact the performance of content-based image retrieval systems, providing insights for balancing privacy and retrieval accuracy.
Contribution
It introduces a systematic evaluation framework and assesses multiple anonymization methods and strategies on several datasets, revealing biases and performance impacts.
Findings
Models trained on original data yield more similar retrievals after anonymization.
Anonymization introduces a bias favoring models trained on unaltered data.
The study offers practical insights for developing privacy-preserving CBIR systems.
Abstract
With the growing importance of privacy regulations such as the General Data Protection Regulation, anonymizing visual data is becoming increasingly relevant across institutions. However, anonymization can negatively affect the performance of Computer Vision systems that rely on visual features, such as Content-Based Image Retrieval (CBIR). Despite this, the impact of anonymization on CBIR has not been systematically studied. This work addresses this gap, motivated by the DOKIQ project, an artificial intelligence-based system for document verification actively used by the State Criminal Police Office Baden-W\"urttemberg. We propose a simple evaluation framework: retrieval results after anonymization should match those obtained before anonymization as closely as possible. To this end, we systematically assess the impact of anonymization using two public datasets and the internal DOKIQ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Privacy-Preserving Technologies in Data
