KoCo: Conditioning Language Model Pre-training on Knowledge Coordinates
Yudong Li, Jiawei Cai, Linlin Shen

TL;DR
KoCo introduces a method to incorporate explicit real-world knowledge context into language model pre-training by mapping documents into semantic coordinates, improving downstream task performance and reducing hallucinations.
Contribution
The paper proposes Knowledge Coordinate Conditioning (KoCo), a novel approach that enhances language models with explicit knowledge context during pre-training.
Findings
KoCo improves performance across 10 downstream tasks.
Pre-training convergence accelerates by approximately 30%.
Model better distinguishes facts from noise, reducing hallucinations.
Abstract
Standard Large Language Model (LLM) pre-training typically treats corpora as flattened token sequences, often overlooking the real-world context that humans naturally rely on to contextualize information. To bridge this gap, we introduce Knowledge Coordinate Conditioning (KoCo), a simple method that maps every document into a three-dimensional semantic coordinate. By prepending these coordinates as textual prefixes for pre-training, we aim to equip the model with explicit contextual awareness to learn the documents within the real-world knowledge structure. Experiment results demonstrate that KoCo significantly enhances performance across 10 downstream tasks and accelerates pre-training convergence by approximately 30\%. Furthermore, our analysis indicates that explicitly modeling knowledge coordinates helps the model distinguish stable facts from noise, effectively mitigating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
