Structured Pruning for Diverse Best-of-N Reasoning Optimization

Hieu Trung Nguyen; Bao Nguyen; Viet Anh Nguyen

arXiv:2506.03978·cs.CL·June 10, 2025

Structured Pruning for Diverse Best-of-N Reasoning Optimization

Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen

PDF

Open Access

TL;DR

This paper introduces SPRINT, a contrastive learning framework that dynamically prunes attention heads in transformer models, improving reasoning performance on challenging tasks by selecting optimal head and layer configurations during inference.

Contribution

The work reveals that selective pruning of attention heads can enhance reasoning, and proposes SPRINT to dynamically identify the best heads for improved accuracy.

Findings

01

SPRINT outperforms traditional head selection methods.

02

Selective pruning improves reasoning accuracy on benchmark datasets.

03

Dynamic head pruning leads to better model performance.

Abstract

Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model's reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of- $N$ and random head selection strategies on the MATH500 and GSM8K datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks