The Role of Generator Access in Autoregressive Post-Training

Amit Kiran Rege

arXiv:2604.04855·cs.LG·April 7, 2026

The Role of Generator Access in Autoregressive Post-Training

Amit Kiran Rege

PDF

TL;DR

This paper investigates how different levels of generator access influence the capabilities and outcomes of autoregressive post-training, highlighting the impact of interface design on performance.

Contribution

It reveals how generator interface constraints affect autoregressive training, demonstrating that richer access can significantly improve outcomes.

Findings

01

Root-start regime limits to on-policy prefix reachability

02

Prefix control enables richer observations and better performance

03

Generator interface changes cause exponential performance gaps

Abstract

We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top- $k$ reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top- $1$ access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.