Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification
Vihaan Patel, Vidya Chhabria, Aman Arora

TL;DR
This paper empirically analyzes the limits of LLM-based hardware verification, focusing on token allocation, coverage gaps, and efficiency improvements through domain specialization.
Contribution
It introduces a two-tier agentic framework that characterizes coverage holes, tracks token usage, and demonstrates efficiency gains with domain-specific models.
Findings
Enhanced system achieves 95-99% coverage with 4-13x fewer tokens.
Domain specialization shifts token allocation toward reasoning.
The study exposes fundamental limits of purely LLM-driven verification.
Abstract
Coverage closure is the most time-consuming phase of hardware verification, and recent large language model (LLM)-based coding agents offer a promising approach to automated stimulus generation. However, prior LLM-based flows do not systematically analyze which coverage holes remain difficult to close or how inference-time computation is allocated during agentic verification. As a result, the efficiency limits and failure modes of LLM-based coverage closure remain poorly understood, particularly for large designs. We present an empirical study using a two-tier agentic framework comprising a base Codex agent and an enhanced domain-specialized LangGraph system. Our framework enables a taxonomy of coverage holes: methodology-bound ceilings (integration tied-off hardware, infeasible boundaries, dead code) and reasoning frontiers (protocol sequencing, multi-module pipeline warm-up, narrow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
