Towards Explainability of SLMs by investigating Token Level Activation

Sayantani Ghosh; Rajashik Datta; Amit Kumar Das; Amlan Chakrabarti

arXiv:2605.22377·cs.LG·May 22, 2026

Towards Explainability of SLMs by investigating Token Level Activation

Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti

PDF

TL;DR

This paper proposes a lightweight, model-agnostic framework using hidden-state activation strengths to interpret token importance in BERT, revealing semantic content focus at Layer 8.

Contribution

It introduces the Activation Flow Network (AFN) framework and an activation bucket method to identify semantically salient tokens in BERT, enhancing interpretability.

Findings

01

Semantically meaningful tokens occupy the high-activation bucket.

02

Layer 8 acts as a semantic consolidation zone.

03

Activation magnitudes highlight important tokens over structural ones.

Abstract

Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically weak tokens such as punctuation marks rather than meaningful semantic relationships. This work introduces a lightweight and model-agnostic framework for quantifying token-level representational importance using hidden-state activation strengths at Layer 8 of BERT. The proposed Activation Flow Network (AFN) framework computes Token Activation Strength using the L2 norm of Layer-8 hidden representations, enabling direct ranking of semantically salient tokens. The study further introduces a threshold-based activation bucket formulation that partitions tokens into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.