The Role of Generator Access in Autoregressive Post-Training
Amit Kiran Rege

TL;DR
This paper investigates how different levels of generator access influence the capabilities and outcomes of autoregressive post-training, highlighting the impact of interface design on performance.
Contribution
It reveals how generator interface constraints affect autoregressive training, demonstrating that richer access can significantly improve outcomes.
Findings
Root-start regime limits to on-policy prefix reachability
Prefix control enables richer observations and better performance
Generator interface changes cause exponential performance gaps
Abstract
We study how generator access constrains autoregressive post-training. The central question is whether the learner is confined to fresh root-start rollouts or can return to previously built prefixes and query the next-token rule there. In the root-start regime, output sampling, generated-token log probabilities, top- reports, and full next-token distributions along sampled trajectories all reduce to one canonical experiment, limited by the on-policy probability of reaching informative prefixes. Weak prefix control breaks this barrier, and once control is available, richer observations such as conditional sampling or logits can outperform top- access. Changing only the generator interface creates an exponential gap for KL-regularized outcome-reward post-training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
