I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames
Katharina Prasse, Isaac Bravo, Stefanie Walter, Margret Keuper

TL;DR
This paper presents a novel approach to visual framing analysis by formulating image clustering as a Minimum Cost Multicut Problem, leveraging different embedding spaces to improve automated detection of visual frames in social science datasets.
Contribution
It introduces the use of the Minimum Cost Multicut formulation for image clustering in visual frame analysis and compares embedding spaces for optimal clustering performance.
Findings
DINOv2 is effective for broad visual frames
ConvNeXt V2 detects fine-grain differences
Multicut clustering improves visual frame detection
Abstract
Visual framing analysis is a key method in social sciences for determining common themes and concepts in a given discourse. To reduce manual effort, image clustering can significantly speed up the annotation process. In this work, we phrase the clustering task as a Minimum Cost Multicut Problem [MP]. Solutions to the MP have been shown to provide clusterings that maximize the posterior probability, solely from provided local, pairwise probabilities of two images belonging to the same cluster. We discuss the efficacy of numerous embedding spaces to detect visual frames and show its superiority over other clustering methods. To this end, we employ the climate change dataset \textit{ClimateTV} which contains images commonly used for visual frame analysis. For broad visual frames, DINOv2 is a suitable embedding space, while ConvNeXt V2 returns a larger number of clusters which contain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsConvNeXt · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
