TraceNAS: Zero-shot LLM Pruning via Gradient Trace Correlation

Prajna G. Malettira; Manish Nagaraj; Arjun Roy; Shubham Negi; Kaushik Roy

arXiv:2602.02891·cs.LG·February 4, 2026

TraceNAS: Zero-shot LLM Pruning via Gradient Trace Correlation

Prajna G. Malettira, Manish Nagaraj, Arjun Roy, Shubham Negi, Kaushik Roy

PDF

Open Access

TL;DR

TraceNAS is a training-free neural architecture search method that efficiently identifies optimal structured pruning configurations for large language models by using a zero-shot proxy to maintain loss landscape alignment, reducing computational costs significantly.

Contribution

It introduces a novel zero-shot NAS framework for LLM pruning that captures global dependencies without training, enabling fast and effective model compression.

Findings

01

Achieves 10× reduction in GPU hours compared to training-aware methods.

02

Maintains competitive performance on benchmarks with significantly less computation.

03

Effective in pruning Llama and Qwen models while preserving accuracy.

Abstract

Structured pruning is essential for efficient deployment of Large Language Models (LLMs). The varying sensitivity of LLM sub-blocks to pruning necessitates the identification of optimal non-uniformly pruned models. Existing methods evaluate the importance of layers, attention heads, or weight channels in isolation. Such localized focus ignores the complex global structural dependencies that exist across the model. Training-aware structured pruning addresses global dependencies, but its computational cost can be just as expensive as post-pruning training. To alleviate the computational burden of training-aware pruning and capture global structural dependencies, we propose TraceNAS, a training-free Neural Architecture Search (NAS) framework that jointly explores structured pruning of LLM depth and width. TraceNAS identifies pruned models that maintain a high degree of loss landscape…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Domain Adaptation and Few-Shot Learning