Hopscotch: Discovering and Skipping Redundancies in Language Models

Mustafa Eyceoz; Nikhil Shivakumar Nayak; Hao Wang; Ligong Han; Akash Srivastava

arXiv:2506.03303·cs.CL·September 16, 2025

Hopscotch: Discovering and Skipping Redundancies in Language Models

Mustafa Eyceoz, Nikhil Shivakumar Nayak, Hao Wang, Ligong Han, Akash Srivastava

PDF

Open Access 1 Video

TL;DR

Hopscotch is a method that identifies and skips less important attention blocks in language models, reducing computational cost while maintaining high output quality, without retraining the entire model.

Contribution

It introduces a lightweight, trainable approach to selectively skip attention blocks in language models, preserving performance without modifying original weights or requiring additional data.

Findings

01

Less than 2% performance drop after skipping four attention blocks in tested models.

02

Compatible with existing compression techniques and does not require retraining.

03

Effectively reduces computational cost in large language models.

Abstract

Modern causal language models stack many attention blocks to improve performance, but not all blocks are necessary for every task. We propose Hopscotch, a simple yet effective method that identifies and skips attention blocks with least contributions to a task and adapts to preserve output quality. Hopscotch jointly optimizes which blocks to skip and how to scale the outputs of the remaining layers. By introducing lightweight, trainable scaling parameters to attention and MLP blocks, it mitigates distribution shifts in hidden states caused by removing attention blocks. Hopscotch does not modify model weights or require access to pretraining or instruction-tuning data, and is compatible with existing model compression techniques. When applied to $Llama-3.1-8B$ and $Qwen2.5-7B$ , Hopscotch achieves less than a 2% drop in performance even after skipping four attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hopscotch: Discovering and Skipping Redundancies in Language Models· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning