Seeing Candidates at Scale: Multimodal LLMs for Visual Political Communication on Instagram
Michael Achmann-Denkler, Mario Haim, Christian Wolff

TL;DR
This study evaluates multimodal large language models like GPT-4o for analyzing political visual content on Instagram, showing they outperform traditional models in identifying politicians and counting individuals during the 2021 German election.
Contribution
It demonstrates the effectiveness of multimodal LLMs in visual political communication analysis, advancing scalable and accurate methods for social media content evaluation.
Findings
GPT-4o achieved a macro F1-score of 0.89 for face recognition.
GPT-4o achieved a macro F1-score of 0.86 for person counting.
Multimodal LLMs outperform traditional computer vision models in VPC analysis.
Abstract
This paper presents a computational case study that evaluates the capabilities of specialized machine learning models and emerging multimodal large language models for Visual Political Communication (VPC) analysis. Focusing on concentrated visibility in Instagram stories and posts during the 2021 German federal election campaign, we compare the performance of traditional computer vision models (FaceNet512, RetinaFace, Google Cloud Vision) with a multimodal large language model (GPT-4o) in identifying front-runner politicians and counting individuals in images. GPT-4o outperformed the other models, achieving a macro F1-score of 0.89 for face recognition and 0.86 for person counting in stories. These findings demonstrate the potential of advanced AI systems to scale and refine visual content analysis in political communication while highlighting methodological considerations for future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
