Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU   Transformers

Cody Wild; Jesper Anderson

arXiv:2407.07848·cs.LG·July 11, 2024

Uncovering Layer-Dependent Activation Sparsity Patterns in ReLU Transformers

Cody Wild, Jesper Anderson

PDF

Open Access

TL;DR

This paper investigates how activation sparsity patterns in ReLU Transformers evolve during training, revealing layer-specific behaviors and the influence of training dynamics on neuron activity.

Contribution

It provides a detailed analysis of layer-dependent sparsity patterns and the mechanisms behind neuron 'turning off' during training in ReLU Transformers.

Findings

01

Layer-specific sparsity patterns vary across the network.

02

First and last layers show distinctive, often inverted, sparsity relationships.

03

Neuron 'death' is primarily driven by training dynamics, not randomness.

Abstract

Previous work has demonstrated that MLPs within ReLU Transformers exhibit high levels of sparsity, with many of their activations equal to zero for any given token. We build on that work to more deeply explore how token-level sparsity evolves over the course of training, and how it connects to broader sparsity patterns over the course of a sequence or batch, demonstrating that the different layers within small transformers exhibit distinctly layer-specific patterns on both of these fronts. In particular, we demonstrate that the first and last layer of the network have distinctive and in many ways inverted relationships to sparsity, and explore implications for the structure of feature representations being learned at different depths of the model. We additionally explore the phenomenon of ReLU dimensions "turning off", and show evidence suggesting that "neuron death" is being primarily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Semiconductor materials and devices · Advancements in Semiconductor Devices and Circuit Design