Void in Language Models

Mani Shemiranifar

arXiv:2505.14467·cs.CL·May 21, 2025

Void in Language Models

Mani Shemiranifar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to detect unactivated layers, called Voids, in transformer-based language models during inference, revealing that selectively skipping layers can improve task performance.

Contribution

The paper adapts L2 Adaptive Computation to identify Voids in LMs, demonstrating that many layers are inactive and that skipping them can enhance model accuracy.

Findings

01

Skipping Voids improves model performance on benchmarks.

02

Different layers activate during prompt processing and response generation.

03

Selective layer skipping reduces computational load while maintaining accuracy.

Abstract

Despite advances in transformer-based language models (LMs), a fundamental question remains largely unanswered: Are all layers activated during inference? We investigate this question by detecting unactivated layers (which we refer to as Voids) using a non-trainable and parameter-free adaptive computation method called L2 Adaptive Computation (LAC). We adapt LAC from its original efficiency-focused application to trace activated layers during inference. This method monitors changes in the L2-norm of activations to identify voids. We analyze layer activation in instruction-tuned LMs across two phases: Prompt Processing (PP), where we trace activated layers for each token in the input prompts, and Response Generation (RG), where we trace activated layers for each generated token. We further demonstrate that distinct layers are activated during these two phases. To show the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manishemirani/void_in_language_models
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques