Towards Explainability of SLMs by investigating Token Level Activation
Sayantani Ghosh, Rajashik Datta, Amit Kumar Das, Amlan Chakrabarti

TL;DR
This paper proposes a lightweight, model-agnostic framework using hidden-state activation strengths to interpret token importance in BERT, revealing semantic content focus at Layer 8.
Contribution
It introduces the Activation Flow Network (AFN) framework and an activation bucket method to identify semantically salient tokens in BERT, enhancing interpretability.
Findings
Semantically meaningful tokens occupy the high-activation bucket.
Layer 8 acts as a semantic consolidation zone.
Activation magnitudes highlight important tokens over structural ones.
Abstract
Transformer-based language models such as BERT having 110M+ parameters have revolutionized natural language understanding, yet their internal mechanisms remain largely opaque to researchers and practitioners. Traditional attention-based interpretability methods often emphasize structurally important but semantically weak tokens such as punctuation marks rather than meaningful semantic relationships. This work introduces a lightweight and model-agnostic framework for quantifying token-level representational importance using hidden-state activation strengths at Layer 8 of BERT. The proposed Activation Flow Network (AFN) framework computes Token Activation Strength using the L2 norm of Layer-8 hidden representations, enabling direct ranking of semantically salient tokens. The study further introduces a threshold-based activation bucket formulation that partitions tokens into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
