HARP: Hesitation-Aware Reframing in Transformer Inference Pass
Romain Stora\"i, Seung-won Hwang

TL;DR
HARP is a model-agnostic, training-free method that adaptively adds computation during Transformer inference by mimicking human hesitation, leading to improved performance and faster inference times.
Contribution
HARP introduces a novel hesitation-aware reframing technique that enhances Transformer inference without retraining or complex modifications.
Findings
Up to +5.16% performance improvement on downstream tasks
Inference times are twice as fast as beam search
Applicable across various models and tasks
Abstract
This paper aims to improve the performance of large language models by addressing the variable computational demands in inference steps, where some tokens require more computational resources than others. We present HARP, a simple modification to "off-the-shelf" Transformer forward pass. Drawing from hesitation and the framing effect in decision-making, HARP selectively applies additional computation when the model encounters uncertainty during token generation. Our method mimics human cognitive processes by pausing at difficult decision points and reframing inputs for a different perspective. Unlike other approaches, HARP is model-agnostic, training-free, and easy to implement. We evaluate our method across various downstream tasks and model sizes, demonstrating performance improvements up to +5.16%. Notably, HARP achieves these gains while maintaining inference times twice faster than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSmart Grid Security and Resilience · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing
