Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs
Aofan Liu, Jingxiang Meng

TL;DR
This paper investigates how large language models often fail to maintain the original output format when prompts are paraphrased, revealing a systematic collapse in output mode across various models and tasks.
Contribution
The authors introduce PARACONSIST, a benchmark with 900 prompts and a Semantic Consistency Score to measure output-mode robustness in LLMs.
Findings
Only about 22% of responses preserve the original label under prompt variations.
Model task structure influences output-mode collapse more than model identity.
Response-mode preservation is crucial for reliable LLM deployment.
Abstract
When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversational prose, the requested format dissolves, and exact-match evaluation pipelines silently misjudge the result. To make this measurable, we release PARACONSIST, a 900-prompt benchmark of 150 base queries with five lexical, syntactic, and semantic-expansion prompt variants each, and a Semantic Consistency Score that decomposes prompt-variant robustness into answer consistency, sentence-BERT semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
