PEARL: Towards Permutation-Resilient LLMs

Liang Chen; Li Shen; Yang Deng; Xiaoyan Zhao; Bin Liang; Kam-Fai Wong

arXiv:2502.14628·cs.LG·February 21, 2025

PEARL: Towards Permutation-Resilient LLMs

Liang Chen, Li Shen, Yang Deng, Xiaoyan Zhao, Bin Liang, Kam-Fai Wong

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces PEARL, a novel training framework that significantly improves the permutation robustness of large language models by optimizing against worst-case input permutations, thereby enhancing safety and reliability.

Contribution

PEARL is the first method to incorporate distributionally robust optimization for permutation resilience in LLMs, using a permutation-proposal network and minimax training.

Findings

01

PEARL reduces vulnerability to permutation attacks by up to 80%.

02

It achieves performance gains of up to 40% in many-shot, long-context tasks.

03

PEARL enhances model robustness with fewer training samples and shorter contexts.

Abstract

The in-context learning (ICL) capability of large language models (LLMs) enables them to perform challenging tasks using provided demonstrations. However, ICL is highly sensitive to the ordering of demonstrations, leading to instability in predictions. This paper shows that this vulnerability can be exploited to design a natural attack - difficult for model providers to detect - that achieves nearly 80% success rate on LLaMA-3 by simply permuting the demonstrations. Existing mitigation methods primarily rely on post-processing and fail to enhance the model's inherent robustness to input permutations, raising concerns about safety and reliability of LLMs. To address this issue, we propose Permutation-resilient learning (PEARL), a novel framework based on distributionally robust optimization (DRO), which optimizes model performance against the worst-case input permutation. Specifically,…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

+ The paper combines distributionally robust optimization using transport maps with the min-max problem of learning over worst-case perturbations, appears to be a unique approach. + The attacker model for finding worst-case perturbation looks to have a favorable computational complexity. + The robustness approach improves upon random and mixup-based baselines.

Weaknesses

Most of this feedback is centered around how this work is situated against / compares to existing work in the literature. + While the authors appear to be aware of some other works studying the fragility of ICL to demonstration order, I felt the paper did not situate their work relative to the existing studies. There is a non-trivial body of work on specifically studying demonstration ordering that has come out since the "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Sho

Reviewer 02Rating 8Confidence 4

Strengths

1. Empirical results on LLM fine-tuning: the results indicating improved average- and worst-case performance in standard ICL benchmarks indicate that PEARL can be useful for practitioners as a part of their fine-tuning pipeline. 1. Effectiveness in the few-shot setting: In addition, the gains above are already noticeable with a small number of shots (2, 3 or 4), indicating the method does not require a very large number of in-context examples to be beneficial (if this were the case, it could hi

Weaknesses

1. The authors work only with Llama 3 8B. It would be relevant to assess the model’s generalizability to include other models; in particular Llama 2 7B and 13B (from the previous generation), Mistral 7B v0.2 and Gemma 7B. 1. Lack of evaluations for many-shot settings: recent work in adversarial robustness has highlighted the vulnerability of LLMs to many-shot adversarial attacks (Anil et al. 2024). Hence, it would strengthen the paper to include additional evaluations where the number of shots

Reviewer 03Rating 8Confidence 3

Strengths

1. The paper offers a simple, straightforward, intuitive and, most importantly, well-performing solution to the problem of permutation-sensitivity. 2. The paper is clearly written and well-presented. 3. It is not immediately obvious how one can handle the combinatorial explosion for the hard permutation mining with a neural network but the authors propose an elegant solution using the Sinkhorn operator and Gumbel sampling.

Weaknesses

1. The evaluations seem to be restricted to the 3-, 4- and 5-shot cases. However, these settings are quite small and one could even enumerate them and try each permutation without needing the P-Net at all. Currently, models have very large contexts and can have hundreds if not thousands of demonstrators (e.g., Many-Shot In-Context Learning, Agarwal et al., 2024). It is not clear from the paper whether such larger sets of demonstrations also exhibit such permuntation-sensitivity. Furthermore, it

Code & Models

Repositories

chanliang/pearl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Digital Rights Management and Security