RoParQ: Paraphrase-Aware Alignment of Large Language Models Towards Robustness to Paraphrased Questions
Minjoon Choi

TL;DR
This paper introduces RoParQ, a benchmark for evaluating paraphrase consistency in LLMs, and proposes a fine-tuning method that significantly improves model robustness to paraphrased questions.
Contribution
We present RoParQ, a novel benchmark for cross-paraphrase consistency, and XParaCon, a new metric for robustness, along with a paraphrase-aware fine-tuning strategy that enhances LLM reliability.
Findings
Fine-tuning improves robustness to paraphrased questions.
Lightweight models achieve performance comparable to larger models.
Our approach reduces superficial memorization in LLMs.
Abstract
Large Language Models (LLMs) often exhibit inconsistent behavior when answering paraphrased questions, suggesting a reliance on surface-level patterns rather than true semantic understanding. To address this limitation, we introduce RoParQ, a benchmark specifically constructed to evaluate cross-paraphrase consistency in closed-book multiple-choice QA. This benchmark is derived from standard datasets by generating paraphrases via proprietary models and selectively retaining examples that elicit inconsistent confidence from a judge model. We further propose XParaCon, a novel evaluation metric that quantifies a model's robustness by measuring the standard deviation of accuracies across question variants. Additionally, we implement a reasoning-based, paraphrase-aware Supervised Fine-Tuning (SFT) strategy designed to align models toward semantic invariance. Our experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
