Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs

Aofan Liu; Jingxiang Meng

arXiv:2605.04665·cs.CL·May 12, 2026

Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs

Aofan Liu, Jingxiang Meng

PDF

TL;DR

This paper investigates how large language models often fail to maintain the original output format when prompts are paraphrased, revealing a systematic collapse in output mode across various models and tasks.

Contribution

The authors introduce PARACONSIST, a benchmark with 900 prompts and a Semantic Consistency Score to measure output-mode robustness in LLMs.

Findings

01

Only about 22% of responses preserve the original label under prompt variations.

02

Model task structure influences output-mode collapse more than model identity.

03

Response-mode preservation is crucial for reliable LLM deployment.

Abstract

When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversational prose, the requested format dissolves, and exact-match evaluation pipelines silently misjudge the result. To make this measurable, we release PARACONSIST, a 900-prompt benchmark of 150 base queries with five lexical, syntactic, and semantic-expansion prompt variants each, and a Semantic Consistency Score that decomposes prompt-variant robustness into answer consistency, sentence-BERT semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.