Set-LLM: A Permutation-Invariant LLM
Beni Egressy, Jan St\"uhmer

TL;DR
This paper introduces Set-LLM, an architecture that makes large language models permutation-invariant, reducing order bias and improving robustness in tasks involving set inputs without sacrificing performance or runtime.
Contribution
Set-LLM is the first approach to adapt pretrained LLMs for permutation invariance through novel attention masks and positional encodings, with theoretical guarantees and practical effectiveness.
Findings
Set-LLM achieves permutation invariance in LLMs.
Set-LLM maintains or improves performance on set-based tasks.
Set-LLM does not increase runtime compared to original models.
Abstract
While large language models (LLMs) demonstrate impressive capabilities across numerous applications, their robustness remains a critical concern. This paper is motivated by a specific vulnerability: the order sensitivity of LLMs. This vulnerability manifests itself as the order bias observed when LLMs decide between possible options (for example, a preference for the first option) and the tendency of LLMs to provide different answers when options are reordered. The use cases for this scenario extend beyond the classical case of multiple-choice question answering to the use of LLMs as automated evaluators in AI pipelines, comparing output generated by different models. We introduce Set-LLM, a novel architectural adaptation for pretrained LLMs that enables the processing of mixed set-text inputs with permutation invariance guarantees. The adaptations involve a new attention mask and new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
