Beyond Tokens: Concept-Level Training Objectives for LLMs
Laya Iyer, Pranav Somani, Alice Guo, Dan Jurafsky, Chen Shani

TL;DR
This paper proposes concept-level training objectives for large language models, replacing token-level prediction to better capture semantic meaning, leading to improved robustness and performance.
Contribution
It introduces methods for integrating concept-level supervision into LLM training, moving beyond token-level objectives to enhance semantic understanding.
Findings
Lower perplexity on language modeling tasks
Improved robustness under domain shifts
Stronger performance on NLP benchmarks
Abstract
The next-token prediction (NTP) objective has been foundational in the development of modern large language models (LLMs), driving advances in fluency and generalization. However, NTP operates at the \textit{token} level, treating deviations from a single reference continuation as errors even when alternative continuations are equally plausible or semantically equivalent (e.g., ``mom'' vs. ``mother''). As a result, token-level loss can penalize valid abstractions, paraphrases, or conceptually correct reasoning paths, biasing models toward surface form rather than underlying meaning. This mismatch between the training signal and semantic correctness motivates learning objectives that operate over higher-level representations. We propose a shift from token-level to concept-level prediction, where concepts group multiple surface forms of the same idea (e.g., ``mom,'' ``mommy,'' ``mother''…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
