MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

Gabriel Afriat; Xiang Meng; Shibal Ibrahim; Hussein Hazimeh; Rahul Mazumder

arXiv:2604.13287·cs.LG·April 16, 2026

MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

Gabriel Afriat, Xiang Meng, Shibal Ibrahim, Hussein Hazimeh, Rahul Mazumder

PDF

TL;DR

MOONSHOT is a flexible framework that enhances neural network pruning by jointly optimizing multiple objectives, leading to better compression and performance on large models like Llama and Vision Transformers.

Contribution

It introduces a multi-objective pruning approach that extends existing methods, improving efficiency and effectiveness without retraining, applicable to billion-parameter models.

Findings

01

Reduces perplexity by up to 32.6% on Llama-3.2 with 2:4 sparsity.

02

Improves zero-shot accuracy by up to 4.9 points across seven benchmarks.

03

Enhances ImageNet accuracy by over 5 points at 70% sparsity.

Abstract

Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.