Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen, Jingxi Yu, Zichen Miao, Qiang Qiu

TL;DR
This paper introduces a sparse coding inspired fine-tuning method for transformers that improves interpretability and performance in generative tasks by representing updates as sparse combinations of fundamental feature atoms.
Contribution
The paper proposes a novel sparse fine-tuning framework for transformers that enhances interpretability and efficiency by using feature dictionaries and sparse coefficients.
Findings
Improves image editing performance via atom removal.
Outperforms baseline methods in text-to-image concept customization.
Enhances interpretability of model updates.
Abstract
Large pre-trained transformers have revolutionized artificial intelligence across various domains, and fine-tuning remains the dominant approach for adapting these models to downstream tasks due to the cost of training from scratch. However, in existing fine-tuning methods, the updated representations are formed as a dense combination of modified parameters, making it challenging to interpret their contributions and understand how the model adapts to new tasks. In this work, we introduce a fine-tuning framework inspired by sparse coding, where fine-tuned features are represented as a sparse combination of basic elements, i.e., feature dictionary atoms. The feature dictionary atoms function as fundamental building blocks of the representation, and tuning atoms allows for seamless adaptation to downstream tasks. Sparse coefficients then serve as indicators of atom importance, identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
