On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

Safal Shrestha; Anubhav Shrestha; Aadim Nepal; Minwu Kim; Keith Ross

arXiv:2602.01997·cs.LG·April 13, 2026

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

Safal Shrestha, Anubhav Shrestha, Aadim Nepal, Minwu Kim, Keith Ross

PDF

1 Repo 4 Models

TL;DR

Layer pruning effectively compresses large language models for classification but significantly impairs generative reasoning capabilities, with limited recovery even after finetuning on large datasets.

Contribution

This paper demonstrates the fundamental limitations of layer pruning for preserving generative reasoning in large language models, highlighting the difficulty of restoring reasoning skills post-pruning.

Findings

01

Pruning causes loss of key algorithmic capabilities like arithmetic and parenthesis generation.

02

Supervised finetuning recovers up to 90% of classification performance but not reasoning.

03

Even extensive post-training on large datasets fails to restore original reasoning abilities.

Abstract

Recent work has shown that layer pruning can effectively compress large language models (LLMs) while retaining strong performance on classification benchmarks, often with little or no finetuning. In contrast, generative reasoning tasks, such as GSM8K and HumanEval\textsuperscript{+}, exhibit substantially weaker recovery. We show that beyond surface-level text degradation, pruning leads to a loss of key algorithmic capabilities, including arithmetic computation and balanced parenthesis generation. Under realistic post-training constraints, without access to pretraining-scale data or compute, we evaluate a minimal recovery strategy based on supervised finetuning with self-generated responses. This approach recovers up to 90\% of baseline performance on classification tasks, but recovery for generative reasoning remains fundamentally limited. Notably, even models finetuned on $\sim$ 400B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

safal312/on-the-limits-of-layer-pruning
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.