xGen-small Technical Report
Erik Nijkamp, Bo Pang, Egor Pakhomov, Akash Gokul, Jin Qu, Silvio Savarese, Yingbo Zhou, Caiming Xiong

TL;DR
xGen-small is a new family of Transformer models designed for long-context tasks, combining innovative data curation, multi-stage pre-training, and targeted fine-tuning to achieve strong performance in math, coding, and long-context benchmarks.
Contribution
It introduces a comprehensive pipeline for training long-context Transformer models, including data curation, multi-stage pre-training, and targeted post-training, which is novel in this domain.
Findings
Strong performance in math and coding tasks
Excels at long context benchmarks
Effective long-context modeling up to 128k tokens
Abstract
We introduce xGen-small, a family of 4B and 9B Transformer decoder models optimized for long-context applications. Our vertically integrated pipeline unites domain-balanced, frequency-aware data curation; multi-stage pre-training with quality annealing and length extension to 128k tokens; and targeted post-training via supervised fine-tuning, preference learning, and online reinforcement learning. xGen-small delivers strong performance across various tasks, especially in math and coding domains, while excelling at long context benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Salesforce/xgen-small-rmodel· ♡ 13♡ 13
- 🤗Salesforce/xgen-small-4B-base-rmodel· 18 dl· ♡ 218 dl♡ 2
- 🤗Salesforce/xgen-small-9B-base-rmodel· 15 dl· ♡ 215 dl♡ 2
- 🤗Salesforce/xgen-small-4B-instruct-rmodel· 96 dl· ♡ 496 dl♡ 4
- 🤗Salesforce/xgen-small-9B-instruct-rmodel· 19 dl· ♡ 1019 dl♡ 10
- 🤗lucyknada/Salesforce_xgen-small-9B-instruct-r-exl3model
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Wireless Signal Modulation Classification · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Absolute Position Encodings
