Harnessing the Power of Large Vision Language Models for Synthetic Image   Detection

Mamadou Keita; Wassim Hamidouche; Hassen Bougueffa; Abdenour Hadid,; Abdelmalik Taleb-Ahmed

arXiv:2404.02726·cs.CV·April 4, 2024·1 cites

Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Mamadou Keita, Wassim Hamidouche, Hassen Bougueffa, Abdenour Hadid,, Abdelmalik Taleb-Ahmed

PDF

Open Access 1 Repo

TL;DR

This paper explores using large vision-language models like BLIP-2 and ViTGPT2, tuned for image captioning, to effectively detect synthetic images and combat misinformation, outperforming traditional methods.

Contribution

It introduces a novel approach of tuning advanced vision-language models specifically for synthetic image detection, enhancing accuracy over existing techniques.

Findings

01

VLMs outperform traditional detection methods.

02

Tuned captioning models improve synthetic image identification.

03

Code and models are publicly available.

Abstract

In recent years, the emergence of models capable of generating images from text has attracted considerable interest, offering the possibility of creating realistic images from text descriptions. Yet these advances have also raised concerns about the potential misuse of these images, including the creation of misleading content such as fake news and propaganda. This study investigates the effectiveness of using advanced vision-language models (VLMs) for synthetic image identification. Specifically, the focus is on tuning state-of-the-art image captioning models for synthetic image detection. By harnessing the robust understanding capabilities of large VLMs, the aim is to distinguish authentic images from synthetic images produced by diffusion-based models. This study contributes to the advancement of synthetic image detection by exploiting the capabilities of visual language models such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mamadou-keita/vlm-detect
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Multimodal Machine Learning Applications

MethodsFocus