LLM Pretraining with Continuous Concepts
Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov,, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li

TL;DR
CoCoMix introduces a novel pretraining framework that combines discrete token prediction with continuous concepts, improving efficiency, performance, and interpretability of large language models across various tasks.
Contribution
It proposes a new pretraining method that integrates continuous concepts learned from autoencoders with token prediction, enhancing model efficiency and interpretability.
Findings
Outperforms standard next token prediction on multiple benchmarks.
Increases sample efficiency in language modeling and reasoning tasks.
Enhances interpretability and steerability of language models.
Abstract
Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse autoencoder and mixes them into the model's hidden state by interleaving with token hidden representations. Through experiments on multiple benchmarks, including language modeling and downstream reasoning tasks, we show that CoCoMix is more sample efficient and consistently outperforms standard next token prediction, knowledge distillation and inserting pause tokens. We find that combining both concept learning and interleaving in an end-to-end framework is critical to…
Peer Reviews
Decision·ICLR 2026 Poster
The main strenghts of the paper: 1. Great pre-training analysis with a novel architecture. 2. A working pre-training recipe that seems to improve performance 3. A way of introducing steerable concepts into the models generation. This can open the door to a lot of interesting research.
Main weaknesses: 1. Model sizes are limited (but understandable). 2. Hyper-parameter tuning was not discussed in detail and perhaps some of the scores can be attributed to poor hyper-params.
1. The CoCoMix model is well described, and the method is easy to follow. 2. The proposal of combining next token prediction with continuous concepts in the pretraining paradigm is novel. This idea of integrating an interpretability mechanism (concept) into pretraining frameworks through SAE comes with significant originality. Such pretraining innovations remain rare in the field, and the model's effectiveness suggests potential incremental impact. 3. This work includes a range of experiments
1. Concept Interpretability: This work has been based on the assumption that the latent representation layer in SAE corresponds to human-interpretable concepts. Since this is central to the interpretability claims, more content addressing this assumption would be helpful, beyond what is described in the steerability section. How exactly does CoCoMix capture real, continuous mixture of concepts? 2. Model Design Justifications: Some architectural choices could be better justified if the authors a
The topic is highly interesting and might generate broad impact to the LLM community. I like the visualizations of figures, which are clear and improve the readability of this paper. The reviewer appreciates the authors for doing this. In experiments, the analysis is solid and comprehensive.
[1 ] To be honest, after reading Figure 1 alone or combined with the text in the introduction, the reviewer is still confused about how the extracted concepts benefit the next token prediction. More explanations might be helpful. [2] During the target concept selection process, it the attribution conducted in each training batch? [3] Another question is about clarification, after the concept selection, why it is necessary to conduct concept prediction? [4] In experiments, the model sizes a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsKnowledge Distillation · Sparse Autoencoder
