Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Brandon Cui; Ximing Lu; Jaehun Jung; Syeda Nahida Akter; Hyunwoo Kim; Yuxiao Qu; David Acuna; Shrimai Prabhumoye; Yejin Choi; Prithviraj Ammanabrolu

arXiv:2605.20285·cs.LG·May 21, 2026

Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages

Brandon Cui, Ximing Lu, Jaehun Jung, Syeda Nahida Akter, Hyunwoo Kim, Yuxiao Qu, David Acuna, Shrimai Prabhumoye, Yejin Choi, Prithviraj Ammanabrolu

PDF

TL;DR

Introspective Training (IXT) uses feedback-conditioned learning to improve the efficiency and performance of large language models across all training stages, enabling better scaling and domain-specific capabilities.

Contribution

The paper introduces IXT, a novel feedback-based training method that leverages post-training insights to enhance early-stage training of LLMs.

Findings

01

Up to 2.8x more compute efficiency achieved.

02

Models reach higher performance in math and code domains.

03

Effective across models from 7.5B to 12B parameters.

Abstract

We tackle the question of how to scale more efficiently across the many, ever-growing stages of current LLM training pipelines. Our guiding intuition stems from the fact that the dynamics of later stages of the pipeline, e.g. post-training, can be used to inform earlier stages such as pre-training. To this end, we propose Introspective Training (or IXT), inspired by offline reward-conditioned reinforcement learning and applicable to any stage of training. IXT uses a thinking reward model to annotate data with natural language critique based feedback, enabling quality aware training from the earliest stages of the pipeline. Models are then trained by prefix-conditioning the data with the generated feedback -- ensuring that not all tokens are treated equally starting much earlier in training than usual. Comprehensive experiments on 7.5-12B transformer-based dense LLMs trained from scratch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.