Learning Where to Edit Vision Transformers
Yunqiao Yang, Long-Kai Huang, Shengzhuang Chen, Kede Ma, Ying Wei

TL;DR
This paper introduces a method for editing vision transformers by identifying and fine-tuning specific model parameters, improving correction of errors caused by subpopulation shifts while maintaining model locality.
Contribution
It proposes a locate-then-edit approach using a meta-learned hypernetwork to identify editable parameters in ViTs, addressing the challenge of where to edit in vision models.
Findings
Outperforms existing methods on a new subpopulation shift benchmark
Effectively identifies sparse, responsive model parameters for editing
Enables adjustable trade-offs between generalization and locality
Abstract
Model editing aims to data-efficiently correct predictive errors of large pre-trained models while ensuring generalization to neighboring failures and locality to minimize unintended effects on unrelated examples. While significant progress has been made in editing Transformer-based large language models, effective strategies for editing vision Transformers (ViTs) in computer vision remain largely untapped. In this paper, we take initial steps towards correcting predictive errors of ViTs, particularly those arising from subpopulation shifts. Taking a locate-then-edit approach, we first address the where-to-edit challenge by meta-learning a hypernetwork on CutMix-augmented data generated for editing reliability. This trained hypernetwork produces generalizable binary masks that identify a sparse subset of structured model parameters, responsive to real-world failure samples. Afterward,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science
MethodsHyperNetwork
