Predicting Through Generation: Why Generation Is Better for Prediction
Md Kowsher, Nusrat Jahan Prottasha, Prakash Bhat, Chun-Nam Yu, Mojtaba Soltanalian, Ivan Garibay, Ozlem Garibay, Chen Chen, Niloofar Yousefi

TL;DR
This paper demonstrates that token-level generation is more effective for prediction tasks than pooled representations, supported by theoretical and empirical evidence, and introduces PredGen, a framework addressing key challenges to improve structured prediction performance.
Contribution
The paper introduces PredGen, a novel end-to-end framework that enhances generation-based prediction by addressing exposure bias and format mismatch, with a new alignment loss for better accuracy.
Findings
PredGen outperforms standard baselines on multiple benchmarks.
Token-level generation retains more mutual information for prediction.
Scheduled sampling reduces exposure bias effectively.
Abstract
This paper argues that generating output tokens is more effective than using pooled representations for prediction tasks because token-level generation retains more mutual information. Since LLMs are trained on massive text corpora using next-token prediction, generation aligns naturally with their learned behavior. Using the Data Processing Inequality (DPI), we provide both theoretical and empirical evidence supporting this claim. However, autoregressive models face two key challenges when used for prediction: (1) exposure bias, where the model sees ground truth tokens during training but relies on its own predictions during inference, leading to errors, and (2) format mismatch, where discrete tokens do not always align with the tasks required output structure. To address these challenges, we introduce PredGen(Predicting Through Generating), an end to end framework that (i) uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Simulation Techniques and Applications · Anomaly Detection Techniques and Applications
MethodsALIGN · Adapter
