Phi-4 Technical Report
Marah Abdin, Jyoti Aneja, Harkirat Behl, S\'ebastien Bubeck, Ronen, Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan, Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung, Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa

TL;DR
Phi-4 is a large language model that emphasizes data quality and synthetic data integration, surpassing its teacher model in STEM reasoning tasks through innovative training and data strategies.
Contribution
The paper introduces phi-4, a 14-billion parameter model that improves upon previous models by focusing on data quality and synthetic data, achieving superior reasoning capabilities.
Findings
Outperforms GPT-4 on STEM QA tasks
Achieves strong reasoning performance relative to size
Utilizes innovative data-generation and post-training techniques
Abstract
We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/phi-4model· 741k dl· ♡ 2224741k dl♡ 2224
- 🤗Pinkstack/SuperThoughts-CoT-14B-16k-o1-QwQ-GGUFmodel· 66 dl· ♡ 266 dl♡ 2
- 🤗p-e-w/phi-4-hereticmodel· 116 dl· ♡ 8116 dl♡ 8
- 🤗sjster/test_v2_mediummodel· 3 dl3 dl
- 🤗vincentoh/phi-4_f16_ollamamodel· 4 dl4 dl
- 🤗GPT4All-Community/phi-4-GGUFmodel· 155 dl155 dl
- 🤗unsloth/phi-4model· 25k dl· ♡ 9025k dl♡ 90
- 🤗unsloth/phi-4-bnb-4bitmodel· 4.0k dl· ♡ 164.0k dl♡ 16
- 🤗unsloth/phi-4-unsloth-bnb-4bitmodel· 12k dl· ♡ 6412k dl♡ 64
- 🤗unsloth/phi-4-GGUFmodel· 2.5k dl· ♡ 1802.5k dl♡ 180
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Science and Engineering
