EigenShield: Causal Subspace Filtering via Random Matrix Theory for   Adversarially Robust Vision-Language Models

Nastaran Darabi; Devashri Naik; Sina Tayebati; Dinithi Jayasuriya,; Ranganath Krishnan; Amit Ranjan Trivedi

arXiv:2502.14976·cs.LG·February 24, 2025

EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models

Nastaran Darabi, Devashri Naik, Sina Tayebati, Dinithi Jayasuriya,, Ranganath Krishnan, Amit Ranjan Trivedi

PDF

TL;DR

EigenShield is a novel inference-time defense for vision-language models that uses Random Matrix Theory to detect and filter adversarial noise in high-dimensional representations, improving robustness without retraining.

Contribution

It introduces a spectral analysis-based method leveraging the spiked covariance model to detect adversarial disruptions in VLMs, avoiding costly retraining and architecture modifications.

Findings

01

EigenShield outperforms existing defenses like adversarial training and UNIGUARD.

02

It effectively detects adversarial noise using spectral deviations in high-dimensional embeddings.

03

The method is architecture-independent and attack-agnostic, providing a robust defense mechanism.

Abstract

Vision-Language Models (VLMs) inherit adversarial vulnerabilities of Large Language Models (LLMs), which are further exacerbated by their multimodal nature. Existing defenses, including adversarial training, input transformations, and heuristic detection, are computationally expensive, architecture-dependent, and fragile against adaptive attacks. We introduce EigenShield, an inference-time defense leveraging Random Matrix Theory to quantify adversarial disruptions in high-dimensional VLM representations. Unlike prior methods that rely on empirical heuristics, EigenShield employs the spiked covariance model to detect structured spectral deviations. Using a Robustness-based Nonconformity Score (RbNS) and quantile-based thresholding, it separates causal eigenvectors, which encode semantic information, from correlational eigenvectors that are susceptible to adversarial artifacts. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.