Influence Tuning: Demoting Spurious Correlations via Instance   Attribution and Instance-Driven Updates

Xiaochuang Han; Yulia Tsvetkov

arXiv:2110.03212·cs.CL·October 8, 2021

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

Xiaochuang Han, Yulia Tsvetkov

PDF

Open Access 1 Repo

TL;DR

This paper introduces influence tuning, a method that uses model interpretations to reduce reliance on spurious correlations in NLP models, leading to improved deconfounding and interpretability.

Contribution

The paper proposes influence tuning, a novel approach that leverages instance attribution to automatically unlearn spurious correlations in deep learning NLP models.

Findings

01

Influence tuning effectively reduces reliance on spurious patterns.

02

It outperforms adversarial training baselines in deconfounding tasks.

03

The method improves model interpretability and robustness.

Abstract

Among the most critical limitations of deep learning NLP models are their lack of interpretability, and their reliance on spurious correlations. Prior work proposed various approaches to interpreting the black-box models to unveil the spurious correlations, but the research was primarily used in human-computer interaction scenarios. It still remains underexplored whether or how such model interpretations can be used to automatically "unlearn" confounding features. In this work, we propose influence tuning--a procedure that leverages model interpretations to update the model parameters towards a plausible interpretation (rather than an interpretation that relies on spurious patterns in the data) in addition to learning to predict the task labels. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data, significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xhan77/influence-tuning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning