Distributed Interpretability and Control for Large Language Models

Dev Arpan Desai; Shaoyi Huang; Zining Zhu

arXiv:2604.06483·cs.LG·April 9, 2026

Distributed Interpretability and Control for Large Language Models

Dev Arpan Desai, Shaoyi Huang, Zining Zhu

PDF

1 Repo

TL;DR

This paper introduces a scalable system for interpretability and control of large multi-GPU language models, achieving significant efficiency improvements and enabling real-time output steering without fine-tuning.

Contribution

It presents a practical multi-GPU interpretability and steering system with memory reduction and throughput increase, demonstrated on multiple large language models.

Findings

01

Memory reduced by up to 7x

02

Throughput increased by up to 41x

03

Achieved controllable output shifts with high steerability

Abstract

Large language models that require multiple GPU cards to host are usually the most capable models. It is necessary to understand and steer these models, but the current technologies do not support the interpretability and steering of these models in the multi-GPU setting as well as the single-GPU setting. We present a practical implementation of activation-level interpretability (logit lens) and steering (steering vector) that scales up to multi-GPU language models. Our system implements design choices that reduce the activation memory by up to 7x and increase the throughput by up to 41x compared to a baseline on identical hardware. We demonstrate the method across LLaMA-3.1 (8B, 70B) and Qwen-3 (4B, 14B, 32B), sustaining 20-100 tokens/s while collecting full layer-wise activation trajectories for sequences of 1,500 tokens. Using label-position steering vectors injected post-LayerNorm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Devdesai1901/LogitLense
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.