Beyond Tokens: Concept-Level Training Objectives for LLMs

Laya Iyer; Pranav Somani; Alice Guo; Dan Jurafsky; Chen Shani

arXiv:2601.11791·cs.CL·January 23, 2026

Beyond Tokens: Concept-Level Training Objectives for LLMs

Laya Iyer, Pranav Somani, Alice Guo, Dan Jurafsky, Chen Shani

PDF

Open Access 1 Video

TL;DR

This paper proposes concept-level training objectives for large language models, replacing token-level prediction to better capture semantic meaning, leading to improved robustness and performance.

Contribution

It introduces methods for integrating concept-level supervision into LLM training, moving beyond token-level objectives to enhance semantic understanding.

Findings

01

Lower perplexity on language modeling tasks

02

Improved robustness under domain shifts

03

Stronger performance on NLP benchmarks

Abstract

The next-token prediction (NTP) objective has been foundational in the development of modern large language models (LLMs), driving advances in fluency and generalization. However, NTP operates at the \textit{token} level, treating deviations from a single reference continuation as errors even when alternative continuations are equally plausible or semantically equivalent (e.g., ``mom'' vs. ``mother''). As a result, token-level loss can penalize valid abstractions, paraphrases, or conceptually correct reasoning paths, biasing models toward surface form rather than underlying meaning. This mismatch between the training signal and semantic correctness motivates learning objectives that operate over higher-level representations. We propose a shift from token-level to concept-level prediction, where concepts group multiple surface forms of the same idea (e.g., ``mom,'' ``mommy,'' ``mother''…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Tokens: Concept-Level Training Objectives for LLMs· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education