CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

Sri Durga Sai Sowmya Kadali; Evangelos E. Papalexakis

arXiv:2508.02997·cs.CL·August 29, 2025

CoCoTen: Detecting Adversarial Inputs to Large Language Models through Latent Space Features of Contextual Co-occurrence Tensors

Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis

PDF

TL;DR

This paper introduces a novel detection method for adversarial prompts in Large Language Models using latent space features of Contextual Co-occurrence Tensors, achieving high accuracy with minimal labeled data and significant speed improvements.

Contribution

It presents a new approach leveraging latent space characteristics of Contextual Co-occurrence Matrices for effective adversarial prompt detection in LLMs.

Findings

01

F1 score of 0.83 with only 0.5% labeled prompts

02

96.6% improvement over baseline methods

03

Speedup ranging from 2.3 to 128.4 times

Abstract

The widespread use of Large Language Models (LLMs) in many applications marks a significant advance in research and practice. However, their complexity and hard-to-understand nature make them vulnerable to attacks, especially jailbreaks designed to produce harmful responses. To counter these threats, developing strong detection methods is essential for the safe and reliable use of LLMs. This paper studies this detection problem using the Contextual Co-occurrence Matrix, a structure recognized for its efficacy in data-scarce environments. We propose a novel method leveraging the latent space characteristics of Contextual Co-occurrence Matrices and Tensors for the effective identification of adversarial and jailbreak prompts. Our evaluations show that this approach achieves a notable F1 score of 0.83 using only 0.5% of labeled prompts, which is a 96.6% improvement over baselines. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.