Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning

Chi Zhang; Haibo Qiu; Qiming Zhang; Yufei Xu; Xinbo Gao; Jing Zhang

arXiv:2605.11636·cs.AI·May 13, 2026

Seir\^enes: Adversarial Self-Play with Evolving Distractions for LLM Reasoning

Chi Zhang, Haibo Qiu, Qiming Zhang, Yufei Xu, Xinbo Gao, Jing Zhang

PDF

TL;DR

Seir extsuperscript{enes} introduces an adversarial self-play framework that enhances LLM reasoning robustness by co-evolving models to generate and overcome distracting contexts, improving performance across benchmarks.

Contribution

The paper presents a novel self-play RL method where models generate and solve challenging contexts, leading to more resilient reasoning capabilities in LLMs.

Findings

01

Seir extsuperscript{enes} improves reasoning accuracy by over 7 points on average across seven benchmarks.

02

Distracting contexts from Seir extsuperscript{enes} models decrease GPT and Gemini accuracy by 4-5 points.

03

The framework scales effectively from 4B to 30B parameter models.

Abstract

We present Seir\^enes, a self-play RL framework that transforms contextual interference from a failure mode of LLM reasoning into an internal training signal for co-evolving more resilient reasoners. While RL with verifiable rewards has significantly advanced reasoning capabilities, models can still exhibit fragility when encountering non-idealized contexts: scenarios characterized by superfluous information, tangential instructions, or incidental correlations that differ from the clean distributions typical of standard benchmarks. Seir\^enes harnesses this vulnerability through a parameter-shared and adversarial self-play loop. Within this framework, a single model is trained to both construct plausible yet distracting contexts that expose its own reasoning blind spots, and solve problems by discerning the essential task from these perturbations to recover the core underlying logic. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.