Identified-Set Geometry of Distributional Model Extraction under Top-$K$ Censored API Access

Wenhua Nie; ZiCheng Zhu; Jianan Wu; Binhan Luo; Haoran Zheng; Jyh-Shing Roger Jang

arXiv:2605.10407·cs.LG·May 12, 2026

Identified-Set Geometry of Distributional Model Extraction under Top-$K$ Censored API Access

Wenhua Nie, ZiCheng Zhu, Jianan Wu, Binhan Luo, Haoran Zheng, Jyh-Shing Roger Jang

PDF

TL;DR

This paper analyzes the limits of extracting distributional information from large language model APIs that only reveal top-$K$ logit scores, showing how much private capability can still be recovered despite censorship.

Contribution

It introduces a geometric framework for understanding the identified set of distributions under top-$K$ censoring and quantifies the recovery limits for distributional and KL divergence measures.

Findings

01

Top-$K$ distillation recovers 12% of private capability.

02

Full-logit distillation recovers 56% of private capability.

03

Generation-based extraction recovers 96% of private capability.

Abstract

Modern LLM APIs often reveal only top- $K$ logit scores and censor the remaining vocabulary. We study the per-position distribution-recovery limits of this access model. For censoring threshold $τ$ , the compatible teacher distributions form an identified set whose total-variation diameter is exactly $U_{K} = (V - K) exp (τ) / (Z_{A} + (V - K) exp (τ))$ , where $Z_{A}$ is the observed partition function. For KL recovery, we give a computable binary-endpoint lower bound and an asymptotically matching small-ambiguity upper bound, with an extension to reference-aware attackers. Experiments on a Qwen3 math-reasoning teacher reveal a layered extraction hierarchy: on-task top- $K$ distillation recovers 12% of private capability, full-logit distillation recovers 56% despite 99% KL closure, and generation-based extraction recovers 96%. Top- $K$ censoring therefore limits per-position distribution recovery…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.