Predictive Auditing of Hidden Tokens in LLM APIs via Reasoning Length Estimation

Ziyao Wang; Guoheng Sun; Yexiao He; Zheyu Shen; Bowei Tian; Ang Li

arXiv:2508.00912·cs.LG·August 5, 2025

Predictive Auditing of Hidden Tokens in LLM APIs via Reasoning Length Estimation

Ziyao Wang, Guoheng Sun, Yexiao He, Zheyu Shen, Bowei Tian, Ang Li

PDF

Open Access

TL;DR

PALACE is a user-side framework that accurately estimates hidden reasoning tokens in LLM API outputs, enabling reliable token auditing without internal access, thus promoting transparency and accountability.

Contribution

It introduces a novel reasoning token count estimation method using a lightweight domain router, addressing variance in token usage across diverse reasoning tasks.

Findings

01

Achieves low relative error in token estimation across multiple benchmarks.

02

Supports fine-grained cost auditing and inflation detection.

03

Demonstrates effectiveness in math, coding, medical, and general reasoning tasks.

Abstract

Commercial LLM services often conceal internal reasoning traces while still charging users for every generated token, including those from hidden intermediate steps, raising concerns of token inflation and potential overbilling. This gap underscores the urgent need for reliable token auditing, yet achieving it is far from straightforward: cryptographic verification (e.g., hash-based signature) offers little assurance when providers control the entire execution pipeline, while user-side prediction struggles with the inherent variance of reasoning LLMs, where token usage fluctuates across domains and prompt styles. To bridge this gap, we present PALACE (Predictive Auditing of LLM APIs via Reasoning Token Count Estimation), a user-side framework that estimates hidden reasoning token counts from prompt-answer pairs without access to internal traces. PALACE introduces a GRPO-augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Web Application Security Vulnerabilities · Security and Verification in Computing