Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

Pere Martra

arXiv:2512.22671·cs.CL·May 7, 2026

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

Pere Martra

PDF

6 Models

TL;DR

This paper investigates how width pruning of GLU-MLP layers in Llama-3.2 models selectively affects capabilities, revealing a dichotomy where factual knowledge degrades while instruction-following improves, challenging uniform degradation assumptions.

Contribution

It systematically characterizes the selective preservation phenomenon in width pruning, linking knowledge degradation with improved behavioral alignment and efficiency trade-offs.

Findings

01

Pruning improves instruction-following performance (+46% to +75%)

02

Factual knowledge capacity degrades as measured by MMLU

03

Pruned models achieve up to 23% energy reduction

Abstract

Structured width pruning of GLU-MLP layers, guided by the Maximum Absolute Weight (MAW) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably, instruction-following capabilities improve substantially (+46% to +75% in IFEval for Llama-3.2-1B and 3B models), and multi-step reasoning remains robust (MUSR). This pattern challenges the prevailing assumption that pruning induces uniform degradation. We evaluated seven expansion ratio configurations using comprehensive benchmarks assessing factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively modulates cognitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.