From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq; Simon S. Woo; Priyanka Singh; Irena Irmalasari; Saakshi Gupta; Dev Gupta

arXiv:2508.07596·cs.CV·August 12, 2025

From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

Shahroz Tariq, Simon S. Woo, Priyanka Singh, Irena Irmalasari, Saakshi Gupta, Dev Gupta

PDF

TL;DR

This paper introduces DF-P2E, a multimodal deepfake detection framework that combines visual, semantic, and narrative explanations to improve interpretability and usability for non-expert users, while maintaining high detection accuracy.

Contribution

The paper presents a novel, modular framework that integrates visual saliency, natural language summaries, and context-aware explanations, advancing interpretability in deepfake detection systems.

Findings

01

Achieves competitive detection performance on the DF40 benchmark.

02

Provides high-quality, aligned explanations with Grad-CAM visualizations.

03

Enhances interpretability and user trust in deepfake detection models.

Abstract

The proliferation of deepfake technologies poses urgent challenges and serious risks to digital integrity, particularly within critical sectors such as forensics, journalism, and the legal system. While existing detection systems have made significant progress in classification accuracy, they typically function as black-box models, offering limited transparency and minimal support for human reasoning. This lack of interpretability hinders their usability in real-world decision-making contexts, especially for non-expert users. In this paper, we present DF-P2E (Deepfake: Prediction to Explanation), a novel multimodal framework that integrates visual, semantic, and narrative layers of explanation to make deepfake detection interpretable and accessible. The framework consists of three modular components: (1) a deepfake classifier with Grad-CAM-based saliency visualisation, (2) a visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.