Mitigating Selection Bias with Node Pruning and Auxiliary Options
Hyeong Kyu Choi, Weijie Xu, Chi Xue, Stephanie Eckman, Chandan K. Reddy

TL;DR
This paper introduces two novel methods, Bias Node Pruning and Auxiliary Option Injection, to reduce selection bias in large language models, improving accuracy and reliability in multiple-choice tasks.
Contribution
It presents a new bias mitigation approach by removing internal bias sources and introduces a novel evaluation metric, CKLD, for better bias measurement.
Findings
Methods improve answer accuracy across models and datasets.
Bias reduction is effective in both white-box and black-box settings.
Proposed metrics better capture distributional biases.
Abstract
Large language models (LLMs) often exhibit systematic preferences for certain answer choices when responding to multiple-choice questions-a behavior known as selection bias. This bias reduces the accuracy and reliability of LLM outputs, limiting their usefulness in decision-critical applications. While prior work has focused on adjusting model inputs or outputs to mitigate this issue, our work takes a fundamentally different approach by identifying and removing the internal sources of bias. We introduce two methods: Bias Node Pruning (BNP), which prunes parameters that contribute to selection bias, and Auxiliary Option Injection (AOI), which introduces an additional answer choice to reduce bias in both white-box and black-box settings. To address the shortcomings of existing evaluation metrics, we propose Choice Kullback-Leibler Divergence (CKLD), a new metric that captures…
Peer Reviews
Decision·Submitted to ICLR 2025
- The paper is well-written, and the methods are straightforward. - The authors conduct experiments with different model architectures
- Pruning model weights can influence model behavior in unforeseen ways, especially since large language models (LLMs) are intended to be general-purpose. - The BNP method appears unstable and offers only marginal performance improvements. - Simply adding the "I don't know" option provides the best overall results; this is expected, given similar behavior observed in older models with the SQuAD v1 and SQuAD v2 datasets. - The paper uses a limited number of datasets to demonstrate the method's e
* This paper is well written. * The selection bias problem of LLMs is well-motivated. * The proposed techniques could effectively mitigate the selection bias on MCQs on the considered models and tasks.
* To me, Section 2 is lacking some details and needs more discussions and clarification. For example, what is the specific setting used to produce Figure 2? Is it evaluated based on zero-shot or in-context learning (or does this matter in terms of selection bias)? Are there any scaling trends since some works suggest that smaller LMs struggled to answer MCQs with the correct format under zero-shot, which might have different behavior on selection bias? Are the open-weight LMs and black-box LMs e
With their framework, they could improve the performance of LLMs on multiple choice QA
- The primary problem of the paper is the experiment claiming that selection bias stems from the final decoder layer. The authors analyze the embedding differences between correct and incorrect questions within the permutation and observe a significant norm difference only in the last layer. They look at the embedding differences at each position but only report the last 50 token positions. Firstly, there cannot be any difference in the embeddings at earlier positions because LLaMA3 is a decoder
Videos
Taxonomy
TopicsGame Theory and Applications · Experimental Behavioral Economics Studies
MethodsPruning · Linear Layer
