KoCo: Conditioning Language Model Pre-training on Knowledge Coordinates

Yudong Li; Jiawei Cai; Linlin Shen

arXiv:2604.12397·cs.CL·April 15, 2026

KoCo: Conditioning Language Model Pre-training on Knowledge Coordinates

Yudong Li, Jiawei Cai, Linlin Shen

PDF

TL;DR

KoCo introduces a method to incorporate explicit real-world knowledge context into language model pre-training by mapping documents into semantic coordinates, improving downstream task performance and reducing hallucinations.

Contribution

The paper proposes Knowledge Coordinate Conditioning (KoCo), a novel approach that enhances language models with explicit knowledge context during pre-training.

Findings

01

KoCo improves performance across 10 downstream tasks.

02

Pre-training convergence accelerates by approximately 30%.

03

Model better distinguishes facts from noise, reducing hallucinations.

Abstract

Standard Large Language Model (LLM) pre-training typically treats corpora as flattened token sequences, often overlooking the real-world context that humans naturally rely on to contextualize information. To bridge this gap, we introduce Knowledge Coordinate Conditioning (KoCo), a simple method that maps every document into a three-dimensional semantic coordinate. By prepending these coordinates as textual prefixes for pre-training, we aim to equip the model with explicit contextual awareness to learn the documents within the real-world knowledge structure. Experiment results demonstrate that KoCo significantly enhances performance across 10 downstream tasks and accelerates pre-training convergence by approximately 30\%. Furthermore, our analysis indicates that explicitly modeling knowledge coordinates helps the model distinguish stable facts from noise, effectively mitigating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.