ToW: Thoughts of Words Improve Reasoning in Large Language Models
Zhikun Xu, Ming Shen, Jacob Dineen, Zhaonan Li, Xiao Ye, Shijie Lu,, Aswin RRV, Chitta Baral, Ben Zhou

TL;DR
This paper presents ToW, a novel data-augmentation method that enhances large language models' reasoning by injecting fine-grained thoughts during training, reducing hallucinations and improving reasoning performance.
Contribution
The paper introduces ToW, a new training approach that incorporates thoughts of words to improve reasoning and reduce hallucinations in language models, using distillation from larger models.
Findings
Improves reasoning performance by 7-9% after training with ToW annotations.
Reduces model hallucination by up to 10%.
Is task-agnostic and introduces no additional biases.
Abstract
We introduce thoughts of words (ToW), a novel training-time data-augmentation method for next-word prediction. ToW views next-word prediction as a core reasoning task and injects fine-grained thoughts explaining what the next word should be and how it is related to the previous contexts in pre-training texts. Our formulation addresses two fundamental drawbacks of existing next-word prediction learning schemes: they induce factual hallucination and are inefficient for models to learn the implicit reasoning processes in raw texts. While there are many ways to acquire such thoughts of words, we explore the first step of acquiring ToW annotations through distilling from larger models. After continual pre-training with only 70K ToW annotations, we effectively improve models' reasoning performances by 7% to 9% on average and reduce model hallucination by up to 10%. At the same time, ToW is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
