Log-linear Guardedness and its Implications

Shauli Ravfogel; Yoav Goldberg; Ryan Cotterell

arXiv:2210.10012·cs.LG·May 14, 2024

Log-linear Guardedness and its Implications

Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell

PDF

TL;DR

This paper introduces the concept of log-linear guardedness to analyze how removing human-interpretable concepts affects downstream classifiers, revealing limitations of linear erasure methods in bias mitigation.

Contribution

It formally defines log-linear guardedness, analyzes its implications for binary and multiclass models, and uncovers inherent limitations of linear erasure techniques.

Findings

01

Binary models cannot recover erased concepts under certain conditions.

02

Multiclass models can sometimes indirectly recover erased concepts.

03

Linear erasure methods have fundamental limitations for bias mitigation.

Abstract

Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax