Rho-1: Not All Tokens Are What You Need

Zhenghao Lin; Zhibin Gou; Yeyun Gong; Xiao Liu; Yelong Shen; Ruochen; Xu; Chen Lin; Yujiu Yang; Jian Jiao; Nan Duan; Weizhu Chen

arXiv:2404.07965·cs.CL·January 9, 2025·1 cites

Rho-1: Not All Tokens Are What You Need

Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen, Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

PDF

Open Access 3 Repos 10 Models

TL;DR

Rho-1 introduces Selective Language Modeling, training on useful tokens identified by scoring, which improves efficiency and accuracy in math and diverse tasks compared to traditional methods.

Contribution

The paper proposes a novel selective training approach that focuses on useful tokens, significantly enhancing language model performance and efficiency.

Findings

01

Up to 30% improvement in few-shot accuracy on math tasks

02

State-of-the-art results on MATH dataset with fewer tokens

03

6.8% average performance boost across diverse tasks

Abstract

Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that "9l training". Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis