Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models
Zhuoyun Li, Boxuan Wang, Jinwei Hu, Zhenglin Huang, Qisong He, Xinmiao Huang, Guangliang Cheng, Xiaowei Huang, Yi Dong

TL;DR
This paper introduces S$^2$R$^2$, a segment-level robustness framework for LoRA-tuned language models that improves resistance to prompt perturbations by focusing on semantic segment alignment and stability.
Contribution
It proposes a novel segment-level approach with an optimal-transport objective and adapter-stability regularizer, enhancing robustness and transferability in language model fine-tuning.
Findings
S$^2$R$^2$ improves robustness against typographical noise, deletion, and paraphrasing.
Maintains competitive performance on clean data.
Achieves stronger cross-dataset transfer than baseline methods.
Abstract
Large language models are sensitive to minor prompt perturbations, yet existing robustness methods usually enforce consistency at the whole-sequence level. This holistic view can hide an important failure mode: a perturbed response may remain globally similar to the clean one while drifting on a critical entity, relation, or conclusion. We introduce SR, a segment-level framework for robust LoRA fine-tuning. SR decomposes clean and perturbed generations into semantic segments, aligns them with an optimal-transport objective, and penalises the segments with the largest meaning drift. To connect this output-side objective with model adaptation, we add an adapter-stability regulariser motivated by segment-level attention reallocation, using LoRA norm control as a tractable proxy for limiting perturbation-amplified evidence shifts. A PAC-Bayesian complexity view further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
