Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models

Md Rafi Ur Rashid; Vishnu Asutosh Dasu; Ye Wang; Gang Tan; Shagufta Mehnaz

arXiv:2505.17089·cs.CL·November 18, 2025

Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models

Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Ye Wang, Gang Tan, Shagufta Mehnaz

PDF

Open Access 1 Video

TL;DR

This paper presents ASE, a novel inference-time framework using Chain-of-Thought reasoning to improve large language models' robustness against adversarial attacks while maintaining user experience.

Contribution

It introduces Adversarial Scenario Extrapolation (ASE), a new method that enhances LLM robustness and seamlessness through self-generative adversarial scenario contemplation.

Findings

01

Near-zero jailbreak success rates achieved.

02

Significant reduction in toxicity and bias scores.

03

Outperforms existing defenses in robustness and seamlessness.

Abstract

Large Language Models (LLMs) exhibit impressive capabilities, but remain susceptible to a growing spectrum of safety risks, including jailbreaks, toxic content, hallucinations, and bias. Existing defenses often address only a single threat type or resort to rigid outright rejection, sacrificing user experience and failing to generalize across diverse and novel attacks. This paper introduces Adversarial Scenario Extrapolation (ASE), a novel inference-time computation framework that leverages Chain-of-Thought (CoT) reasoning to simultaneously enhance LLM robustness and seamlessness. ASE guides the LLM through a self-generative process of contemplating potential adversarial scenarios and formulating defensive strategies before generating a response to the user query. Comprehensive evaluation on four adversarial benchmarks with four latest LLMs shows that ASE achieves near-zero jailbreak…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection