Enabling Performant and Flexible Model-Internal Observability for LLM Inference

Nengneng Yu; Sixian Xiong; Yibo Zhao; Wei Wang; Zaoxing Liu

arXiv:2605.11093·cs.LG·May 13, 2026

Enabling Performant and Flexible Model-Internal Observability for LLM Inference

Nengneng Yu, Sixian Xiong, Yibo Zhao, Wei Wang, Zaoxing Liu

PDF

1 Repo

TL;DR

DMI-Lib is a high-speed, flexible system for internal observability of large language models during inference, significantly reducing latency overhead while maintaining detailed internal state access.

Contribution

It introduces DMI-Lib, a novel asynchronous observability system that decouples internal state monitoring from inference, enabling efficient, detailed internal signals collection.

Findings

01

Incurs only 0.4%-6.8% overhead in offline inference

02

Achieves an average of 6% overhead in online serving

03

Reduces latency overhead by 2x-15x compared to baselines

Abstract

Today's inference-time workloads increasingly depend on timely access to a model's internal states. We present DMI-Lib, a high-speed deep model inspector that treats internal observability as a first-class systems primitive, decoupling it from the inference hot path via an asynchronous observability substrate built from Ring^2, a GPU-CPU memory abstraction for capturing and staging tensors, and a policy-controlled host backend that exports them. DMI-Lib enables the placement of observation points across a rich space of internal signals and diverse inference backends while preserving serving optimizations and adhering to tight GPU memory budgets. Our experiments demonstrate that DMI-Lib incurs only 0.4%--6.8% overhead in offline batch inference and an average of 6% in moderate online serving, reducing latency overhead by 2x-15x compared to existing baselines with similar observability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ProjectDMX/DMI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.