Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks
Doohee You

TL;DR
This paper introduces TRIAD, a predictive framework for detecting and mitigating novel multimodal, multi-turn adversarial attacks on large language models by modeling conversational trajectories and structural anomalies.
Contribution
The paper proposes a novel dynamic safety verification framework, TRIAD, combining trajectory analysis, anomaly detection, and hazard modeling for real-time defense against unseen multimodal attacks.
Findings
TRIAD provides a mathematically bounded expected time-to-failure under attack.
The framework effectively detects structural anomalies and malicious drift in multimodal conversations.
TRIAD offers a computationally efficient and interpretable safety safeguard for AI systems.
Abstract
The expansion of Multimodal Large Language Models (MLLMs) and their integration into autonomous agentic workflows has introduced a non-stationary attack surface. Empirical observations indicate that adversaries employ progressive, cross-modal perturbations that evade turn-specific guardrails by distributing malicious intent across longitudinal conversational trajectories. Static defense mechanisms, constrained by the Markov property, evaluate inputs in isolation and fail to detect cumulative structural poisoning. To handle this limitation, this paper formulates safety verification as a dynamic survival prediction and trajectory dynamics problem. The Triple-tier Anomaly Defense (TRIAD) framework is proposed as a predictive model that maps multimodal and multi-turn conversational flow as a continuous trajectory. The framework integrates structural anomaly detection to monitor covariance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
