nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers

Cl\'ement Dumas

arXiv:2511.14465·cs.LG·December 16, 2025

nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers

Cl\'ement Dumas

PDF

Open Access

TL;DR

nnterp offers a standardized, reliable interface for transformer interpretability that works across diverse architectures, combining the accuracy of HuggingFace with the consistency of custom tools.

Contribution

It introduces nnterp, a lightweight wrapper that standardizes transformer analysis, enabling cross-architecture interpretability with validation and built-in methods.

Findings

01

Supports 50+ models across 16 architectures

02

Ensures consistent analysis with validation tests

03

Includes common interpretability tools

Abstract

Mechanistic interpretability research requires reliable tools for analyzing transformer internals across diverse architectures. Current approaches face a fundamental tradeoff: custom implementations like TransformerLens ensure consistent interfaces but require coding a manual adaptation for each architecture, introducing numerical mismatch with the original models, while direct HuggingFace access through NNsight preserves exact behavior but lacks standardization across models. To bridge this gap, we develop nnterp, a lightweight wrapper around NNsight that provides a unified interface for transformer analysis while preserving original HuggingFace implementations. Through automatic module renaming and comprehensive validation testing, nnterp enables researchers to write intervention code once and deploy it across 50+ model variants spanning 16 architecture families. The library includes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPower Transformer Diagnostics and Insulation · Explainable Artificial Intelligence (XAI) · Magnetic Properties and Applications