On the Equivalence Between Auto-Regressive Next Token Prediction and Full-Item-Vocabulary Maximum Likelihood Estimation in Generative Recommendation--A Short Note
Yusheng Huang, Shuang Yang, Zhaojie Liu, Han Li

TL;DR
This paper proves that auto-regressive next-token prediction in generative recommendation systems is mathematically equivalent to full-item-vocabulary maximum likelihood estimation, providing a theoretical foundation for industrial practices.
Contribution
It establishes a formal proof of the equivalence between AR-NTP and FV-MLE in generative recommendation, applicable to common tokenization schemes.
Findings
Proves the strict mathematical equivalence between AR-NTP and FV-MLE.
Shows the equivalence holds for both cascaded and parallel tokenizations.
Provides a theoretical basis for optimizing industrial generative recommendation systems.
Abstract
Generative recommendation (GR) has emerged as a widely adopted paradigm in industrial sequential recommendation. Current GR systems follow a similar pipeline: tokenization for item indexing, next-token prediction as the training objective and auto-regressive decoding for next-item generation. However, existing GR research mainly focuses on architecture design and empirical performance optimization, with few rigorous theoretical explanations for the working mechanism of auto-regressive next-token prediction in recommendation scenarios. In this work, we formally prove that \textbf{the k-token auto-regressive next-token prediction (AR-NTP) paradigm is strictly mathematically equivalent to full-item-vocabulary maximum likelihood estimation (FV-MLE)}, under the core premise of a bijective mapping between items and their corresponding k-token sequences. We further show that this equivalence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
