Guide the Learner: Controlling Product of Experts Debiasing Method Based on Token Attribution Similarities
Ali Modarressi, Hossein Amirkhani, Mohammad Taher Pilehvar

TL;DR
This paper proposes a novel fine-tuning approach that uses token attribution similarities within a Product of Experts framework to better mitigate dataset biases and improve out-of-distribution performance in NLP tasks.
Contribution
It introduces a new bias mitigation method that considers attribution similarity, enhancing OOD robustness without sacrificing in-distribution accuracy.
Findings
Improves OOD performance on NLP benchmarks
Maintains in-distribution accuracy
Outperforms baseline bias mitigation methods
Abstract
Several proposals have been put forward in recent years for improving out-of-distribution (OOD) performance through mitigating dataset biases. A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model. Here, the underlying assumption is that the biased model resorts to shortcut features. Hence, those training examples that are correctly predicted by the biased model are flagged as being biased and are down-weighted during the training of the main model. However, assessing the importance of an instance merely based on the predictions of the biased model may be too naive. It is possible that the prediction of the main model can be derived from another decision-making process that is distinct from the behavior of the biased model. To circumvent this, we introduce a fine-tuning strategy that incorporates the similarity between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification
