Dataset and Case Studies for Visual Near-Duplicates Detection in the Context of Social Media
Hana Matatov, Mor Naaman, Ofra Amir

TL;DR
This paper introduces a new dataset of social media images and evaluates various visual near-duplicate detection methods, demonstrating their effectiveness in large-scale social media content analysis and supporting manual review systems.
Contribution
It provides a large-scale social media image dataset and assesses multiple visual feature extraction methods for near-duplicate detection, highlighting their practical applications.
Findings
High recall achieved in near-duplicate retrieval
Effective use of advanced visual features for social media images
Potential for supporting manual review and large-scale analysis
Abstract
The massive spread of visual content through the web and social media poses both challenges and opportunities. Tracking visually-similar content is an important task for studying and analyzing social phenomena related to the spread of such content. In this paper, we address this need by building a dataset of social media images and evaluating visual near-duplicates retrieval methods based on image retrieval and several advanced visual feature extraction methods. We evaluate the methods using a large-scale dataset of images we crawl from social media and their manipulated versions we generated, presenting promising results in terms of recall. We demonstrate the potential of this method in two case studies: one that shows the value of creating systems supporting manual content review, and another that demonstrates the usefulness of automatic large-scale data analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Digital Media Forensic Detection
