HUT: A More Computation Efficient Fine-Tuning Method With Hadamard   Updated Transformation

Geyuan Zhang; Xiaofei Zhou; Chuheng Chen

arXiv:2409.13501·cs.CL·September 23, 2024

HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation

Geyuan Zhang, Xiaofei Zhou, Chuheng Chen

PDF

Open Access

TL;DR

HUT introduces a Hadamard-based parameter-efficient fine-tuning method that preserves parameter correlation, enhances expressiveness, and reduces computational costs for large pre-trained language models.

Contribution

The paper proposes the Hadamard Updated Transformation (HUT), a novel PEFT approach that constructs direct transformations from original to updated parameters, improving efficiency and expressiveness.

Findings

01

HUT achieves comparable or better performance than existing PEFT methods.

02

HUT significantly reduces computational complexity in fine-tuning.

03

Theoretical and experimental validation on RoBERTa and GPT-2 demonstrate effectiveness.

Abstract

Fine-tuning pre-trained language models for downstream tasks has achieved impressive results in NLP. However, fine-tuning all parameters becomes impractical due to the rapidly increasing size of model parameters. To address this, Parameter Efficient Fine-Tuning (PEFT) methods update only a subset of parameters. Most PEFT methods, such as LoRA, use incremental updates, which involve adding learned weight matrix increments to the original parameters. Although effective, these methods face limitations in capturing complex parameter dynamics and do not maintain a strong correlation between the original and updated parameters. To overcome these challenges, we propose the direct Updated Transformation (UT) paradigm, which constructs a transformation directly from the original to the updated parameters. This approach ensures that the correlation between the original and updated parameters is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Filter Design and Implementation · Advanced Wireless Communication Techniques · Photonic and Optical Devices

MethodsAttention Is All You Need · Linear Layer · Cosine Annealing · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Linear Warmup With Cosine Annealing · Adam · WordPiece