TL;DR
This paper introduces Infusion, a method for subtly editing training data using influence functions to systematically shape model behavior across vision and language tasks.
Contribution
Infusion is a scalable influence-function-based framework that enables targeted training data edits to induce specific model behavior changes.
Findings
Small data edits (0.2%) can match explicit behavior examples in effectiveness.
Infusion transfers across different model architectures, affecting multiple models.
It is most effective at amplifying behaviors already learned by the model.
Abstract
Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion on data poisoning tasks across vision and language domains. On CIFAR-10, we show that making subtle edits via Infusion to just 0.2% (100/45,000) of the training documents can be competitive with the baseline of inserting a small number of explicit behavior examples. We also find that Infusion transfers across architectures (ResNet CNN), suggesting a single poisoned corpus can affect multiple independently trained models. In preliminary language experiments, we characterize when our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
