Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

Lu Yin; Ajay Jaiswal; Shiwei Liu; Souvik Kundu; Zhangyang Wang

arXiv:2310.02277·cs.LG·May 1, 2026

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang

PDF

1 Repo

TL;DR

This paper challenges the belief that small weights in large language models are redundant, showing they encode crucial knowledge for difficult tasks and that pruning them irreversibly impairs performance.

Contribution

It introduces the Junk DNA Hypothesis, demonstrating that pruning small weights monotonically degrades performance on hard tasks and that quantization does not have the same effect.

Findings

01

Pruning small weights causes monotonic performance drops on difficult tasks.

02

Small weights encode essential knowledge for challenging downstream tasks.

03

Quantization does not exhibit similar monotonic effects as pruning.

Abstract

We present Junk DNA Hypothesis by adopting a novel task-centric angle for the pre-trained weights of large language models (LLMs). It has been believed that weights in LLMs contain significant redundancy, leading to the conception that a considerable chunk of the parameters can be removed by pruning without compromising performance. Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude. Moreover, we reveal that these seemingly inconsequential weights can result in irreparable loss of knowledge and performance degradation in difficult tasks, even when downstream continual training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VITA-Group/Junk_DNA_Hypothesis.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.