RAZOR: Sharpening Knowledge by Cutting Bias with Unsupervised Text Rewriting
Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

TL;DR
RAZOR is an unsupervised text rewriting method that reduces dataset biases in language models by iteratively replacing biased segments, leading to improved generalization and fairness without prior bias knowledge.
Contribution
Introduces RAZOR, a novel unsupervised text rewriting approach for debiasing language models, eliminating the need for prior bias information and enhancing model robustness.
Findings
RAZOR improves F1 scores by 3.5% on FEVER and 6.5% on MNLI and SNLI datasets.
Effectively mitigates known biases, halving bias-related terms without prior bias knowledge.
Achieves performance comparable to state-of-the-art models that rely on prior bias information.
Abstract
Despite the widespread use of LLMs due to their superior performance in various tasks, their high computational costs often lead potential users to opt for the pretraining-finetuning pipeline. However, biases prevalent in manually constructed datasets can introduce spurious correlations between tokens and labels, creating so-called shortcuts and hindering the generalizability of fine-tuned models. Existing debiasing methods often rely on prior knowledge of specific dataset biases, which is challenging to acquire a priori. We propose RAZOR (Rewriting And Zero-bias Optimization Refinement), a novel, unsupervised, and data-focused debiasing approach based on text rewriting for shortcut mitigation. RAZOR leverages LLMs to iteratively rewrite potentially biased text segments by replacing them with heuristically selected alternatives in a shortcut space defined by token statistics and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsALIGN · OPT
