TL;DR
Obliviator is a novel post-hoc method that effectively erases unwanted attributes from learned representations by capturing complex nonlinear dependencies, revealing the utility-cost trade-off during erasure.
Contribution
It introduces Obliviator, a gradual, kernel-based approach for nonlinear concept erasure that better preserves utility and quantifies the erasure cost.
Findings
Obliviator outperforms baseline methods in utility-erasure trade-offs.
The method reveals the dynamics between attribute protection and utility preservation.
More capable models with disentangled representations benefit more from Obliviator.
Abstract
Concept erasure aims to remove unwanted attributes, such as social or demographic factors, from learned representations, while preserving their task-relevant utility. While the goal of concept erasure is protection against all adversaries, existing methods remain vulnerable to nonlinear ones. This vulnerability arises from their failure to fully capture the complex, nonlinear statistical dependencies between learned representations and unwanted attributes. Moreover, although the existence of a trade-off between utility and erasure is expected, its progression during the erasure process, i.e., the cost of erasure, remains unstudied. In this work, we introduce Obliviator, a post-hoc erasure method designed to fully capture nonlinear statistical dependencies. We formulate erasure from a functional perspective, leading to an optimization problem involving a composition of kernels that lacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
