RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification

Mamadou Keita; Wassim Hamidouche; Hessen Bougueffa Eutamene; Abdelmalik Taleb-Ahmed; Abdenour Hadid

arXiv:2508.03967·cs.CV·August 7, 2025

RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification

Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdelmalik Taleb-Ahmed, Abdenour Hadid

PDF

Open Access

TL;DR

RAVID introduces a retrieval-augmented visual detection framework that leverages relevant image retrieval and vision-language models to improve AI-generated image detection accuracy and robustness across various generative models and image degradations.

Contribution

The paper presents the first retrieval-augmented approach for AI-generated image detection, combining a fine-tuned CLIP encoder with a vision-language model for enhanced detection performance.

Findings

01

Achieves 93.85% accuracy on UniversalFakeDetect benchmark.

02

Outperforms existing methods under image degradations like Gaussian blur and JPEG compression.

03

Demonstrates robustness and generalization across multiple generative models.

Abstract

In this paper, we introduce RAVID, the first framework for AI-generated image detection that leverages visual retrieval-augmented generation (RAG). While RAG methods have shown promise in mitigating factual inaccuracies in foundation models, they have primarily focused on text, leaving visual knowledge underexplored. Meanwhile, existing detection methods, which struggle with generalization and robustness, often rely on low-level artifacts and model-specific features, limiting their adaptability. To address this, RAVID dynamically retrieves relevant images to enhance detection. Our approach utilizes a fine-tuned CLIP image encoder, RAVID CLIP, enhanced with category-related prompts to improve representation learning. We further integrate a vision-language model (VLM) to fuse retrieved images with the query, enriching the input and improving accuracy. Given a query image, RAVID generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques