The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology

Aideen Fay; In\'es Garc\'ia-Redondo; Qiquan Wang; Haim Dubossarsky; Anthea Monod

arXiv:2505.20435·cs.LG·April 27, 2026

The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology

Aideen Fay, In\'es Garc\'ia-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod

PDF

1 Video

TL;DR

This paper uses persistent homology to analyze how adversarial inputs alter the high-dimensional geometry of LLMs' internal representations, revealing a consistent topological compression across models and attack types.

Contribution

It introduces a novel topological analysis framework that captures nonlinear geometric invariants of LLM representations under adversarial influence, complementing existing interpretability methods.

Findings

01

Adversarial inputs cause topological compression of latent spaces.

02

The topological signature is consistent across different models and attack modes.

03

The framework reveals geometric invariants that are early and discriminative in the network.

Abstract

Existing interpretability methods for Large Language Models (LLMs) predominantly capture linear directions or isolated features. This overlooks the high-dimensional, relational, and nonlinear geometry of model representations. We apply persistent homology (PH) to characterize how adversarial inputs reshape the geometry and topology of internal representation spaces of LLMs. This phenomenon, especially when considered across operationally different attack modes, remains poorly understood. We analyze six models (3.8B to 70B parameters) under two distinct attacks, indirect prompt injection and backdoor fine--tuning, and show that a consistent topological signature persists throughout. Adversarial inputs induce topological compression, where the latent space becomes structurally simpler, collapsing the latent space from varied, compact, small-scale features into fewer, dominant, large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology· slideslive