Timing and Memory Telemetry on GPUs for AI Governance
Saleh K. Monfared, Fatemeh Ganji, Dan Holcomb, Shahin Tajik

TL;DR
This paper presents a novel framework for GPU telemetry using timing and memory measurements to monitor AI compute activity, addressing security concerns in untrusted environments.
Contribution
It introduces four innovative primitives leveraging GPU architecture to generate observable signals of compute activity without relying on trusted firmware or counters.
Findings
Timing and residency latencies correlate with GPU utilization.
Primitive responses vary with contention, memory pressure, and power.
Telemetry signals remain observable in untrusted environments.
Abstract
The rapid expansion of GPU-accelerated computing has enabled major advances in large-scale artificial intelligence (AI), while heightening concerns about how accelerators are observed or governed once deployed. Governance is essential to ensure that large-scale compute infrastructure is not silently repurposed for training models, circumventing usage policies, or operating outside legal oversight. Because current GPUs expose limited trusted telemetry and can be modified or virtualized by adversaries, we explore whether compute-based measurements can provide actionable signals of utilization when host and device are untrusted. We introduce a measurement framework that leverages architectural characteristics of modern GPUs to generate timing- and memory-based observables that correlate with compute activity. Our design draws on four complementary primitives: (1) a probabilistic,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
