Let the Fuzzy Rule Speak: Enhancing In-context Learning Debiasing with Interpretability
Ruixi Lin, Yang You

TL;DR
This paper introduces FuRud, a fuzzy rule-based method that corrects class probability imbalances in large language models during in-context learning, significantly improving accuracy and fairness without retraining the models.
Contribution
FuRud is a novel, interpretable approach using fuzzy sets to perform instance-specific probability corrections, addressing class imbalance in ICL without model retraining.
Findings
Reduces class accuracy bias by over 56%
Improves overall accuracy by 21% relative
Outperforms existing debiasing methods across datasets
Abstract
Large language models (LLMs) often struggle with balanced class accuracy in text classification tasks using in-context learning (ICL), hindering some practical uses due to user dissatisfaction or safety risks caused by misclassifications. Retraining LLMs to address root causes in data or model priors is neither easy nor cost-effective. This paper delves deeper into the class accuracy imbalance issue, identifying that it arises because certain classes consistently receive disproportionately high ICL probabilities, causing under-prediction and lower accuracy for others. More importantly, probability ranges affect the imbalance differently, allowing for precise, range-specific corrections. We introduce FuRud (Fuzzy Rule Optimization-based Debiasing), a method for sample-level class probability correction. FuRud tackles interpretability challenges by determining why certain classes need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare
