RoParQ: Paraphrase-Aware Alignment of Large Language Models Towards Robustness to Paraphrased Questions

Minjoon Choi

arXiv:2511.21568·cs.CL·November 27, 2025

RoParQ: Paraphrase-Aware Alignment of Large Language Models Towards Robustness to Paraphrased Questions

Minjoon Choi

PDF

Open Access

TL;DR

This paper introduces RoParQ, a benchmark for evaluating paraphrase consistency in LLMs, and proposes a fine-tuning method that significantly improves model robustness to paraphrased questions.

Contribution

We present RoParQ, a novel benchmark for cross-paraphrase consistency, and XParaCon, a new metric for robustness, along with a paraphrase-aware fine-tuning strategy that enhances LLM reliability.

Findings

01

Fine-tuning improves robustness to paraphrased questions.

02

Lightweight models achieve performance comparable to larger models.

03

Our approach reduces superficial memorization in LLMs.

Abstract

Large Language Models (LLMs) often exhibit inconsistent behavior when answering paraphrased questions, suggesting a reliance on surface-level patterns rather than true semantic understanding. To address this limitation, we introduce RoParQ, a benchmark specifically constructed to evaluate cross-paraphrase consistency in closed-book multiple-choice QA. This benchmark is derived from standard datasets by generating paraphrases via proprietary models and selectively retaining examples that elicit inconsistent confidence from a judge model. We further propose XParaCon, a novel evaluation metric that quantifies a model's robustness by measuring the standard deviation of accuracies across question variants. Additionally, we implement a reasoning-based, paraphrase-aware Supervised Fine-Tuning (SFT) strategy designed to align models toward semantic invariance. Our experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques