Enhancing In-Context Learning Performance with just SVD-Based Weight   Pruning: A Theoretical Perspective

Xinhao Yao; Xiaolin Hu; Shenzhi Yang; Yong Liu

arXiv:2406.03768·cs.LG·October 15, 2024

Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

Xinhao Yao, Xiaolin Hu, Shenzhi Yang, Yong Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that SVD-based weight pruning can improve in-context learning performance in large language models, with deeper layer pruning often yielding more stable improvements, supported by theoretical analysis and experimental validation.

Contribution

It provides the first theoretical explanation for how SVD-based pruning enhances ICL, and introduces a simple, effective algorithm for improving ICL in downstream tasks.

Findings

01

Pruning deep layers leads to more stable ICL improvements.

02

SVD-based pruning enhances ICL performance on benchmark datasets.

03

Theoretical analysis explains the mechanisms behind pruning benefits.

Abstract

Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities. With a few demonstration input-label pairs, they can predict the label for an unseen input without any parameter updates. In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance, and more surprising, pruning weights in deep layers often results in more stable performance improvements than in shallow layers. However, the underlying mechanism of those findings still remains an open question. To reveal those findings, we conduct an in-depth theoretical analysis by presenting the implicit gradient descent (GD) trajectories of ICL and giving the mutual information based generalization bounds of ICL via full implicit GD trajectories. This helps us reasonably explain the surprising experimental findings. Besides, based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chen123ctrls/enhancingicl_svdpruning
pytorchOfficial

Videos

Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention