Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
Mamadou Keita, Wassim Hamidouche, Hassen Bougueffa, Abdenour Hadid,, Abdelmalik Taleb-Ahmed

TL;DR
This paper explores using large vision-language models like BLIP-2 and ViTGPT2, tuned for image captioning, to effectively detect synthetic images and combat misinformation, outperforming traditional methods.
Contribution
It introduces a novel approach of tuning advanced vision-language models specifically for synthetic image detection, enhancing accuracy over existing techniques.
Findings
VLMs outperform traditional detection methods.
Tuned captioning models improve synthetic image identification.
Code and models are publicly available.
Abstract
In recent years, the emergence of models capable of generating images from text has attracted considerable interest, offering the possibility of creating realistic images from text descriptions. Yet these advances have also raised concerns about the potential misuse of these images, including the creation of misleading content such as fake news and propaganda. This study investigates the effectiveness of using advanced vision-language models (VLMs) for synthetic image identification. Specifically, the focus is on tuning state-of-the-art image captioning models for synthetic image detection. By harnessing the robust understanding capabilities of large VLMs, the aim is to distinguish authentic images from synthetic images produced by diffusion-based models. This study contributes to the advancement of synthetic image detection by exploiting the capabilities of visual language models such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Multimodal Machine Learning Applications
MethodsFocus
