Structured Pruning for Diverse Best-of-N Reasoning Optimization
Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen

TL;DR
This paper introduces SPRINT, a contrastive learning framework that dynamically prunes attention heads in transformer models, improving reasoning performance on challenging tasks by selecting optimal head and layer configurations during inference.
Contribution
The work reveals that selective pruning of attention heads can enhance reasoning, and proposes SPRINT to dynamically identify the best heads for improved accuracy.
Findings
SPRINT outperforms traditional head selection methods.
Selective pruning improves reasoning accuracy on benchmark datasets.
Dynamic head pruning leads to better model performance.
Abstract
Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model's reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of- and random head selection strategies on the MATH500 and GSM8K datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
