Can MLLMs Detect Phishing? A Comprehensive Security Benchmark Suite Focusing on Dynamic Threats and Multimodal Evaluation in Academic Environments

Jingzhuo Zhou

arXiv:2511.15165·cs.CR·November 25, 2025

Can MLLMs Detect Phishing? A Comprehensive Security Benchmark Suite Focusing on Dynamic Threats and Multimodal Evaluation in Academic Environments

Jingzhuo Zhou

PDF

Open Access

TL;DR

This paper introduces AdapT-Bench, a comprehensive benchmark suite designed to evaluate Multimodal Large Language Models' ability to detect sophisticated, dynamic phishing attacks in academic environments, addressing current gaps in security assessment tools.

Contribution

The paper presents AdapT-Bench, a novel benchmark suite tailored for assessing MLLMs' effectiveness against evolving academic phishing threats, incorporating multimodal and contextual data.

Findings

01

MLLMs show varying effectiveness across different phishing scenarios.

02

AdapT-Bench reveals specific vulnerabilities in current MLLMs.

03

Benchmark results guide future improvements in security models.

Abstract

The rapid proliferation of Multimodal Large Language Models (MLLMs) has introduced unprecedented security challenges, particularly in phishing detection within academic environments. Academic institutions and researchers are high-value targets, facing dynamic, multilingual, and context-dependent threats that leverage research backgrounds, academic collaborations, and personal information to craft highly tailored attacks. Existing security benchmarks largely rely on datasets that do not incorporate specific academic background information, making them inadequate for capturing the evolving attack patterns and human-centric vulnerability factors specific to academia. To address this gap, we present AdapT-Bench, a unified methodological framework and benchmark suite for systematically evaluating MLLM defense capabilities against dynamic phishing attacks in academic settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Authorship Attribution and Profiling · Misinformation and Its Impacts