Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad, Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl,, Alon Benhaim, Misha Bilenko, Johan Bjorck, S\'ebastien Bubeck, Martin Cai,, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen

TL;DR
This paper introduces a series of highly capable, small-scale language models, including phi-3-mini, that achieve performance comparable to larger models and can be deployed on phones, with extensions for multilingual, multimodal, and long-context capabilities.
Contribution
The paper presents new small and medium-sized language models with high performance, scalable training data, and enhanced capabilities for multilingual, multimodal, and long-context tasks.
Findings
phi-3-mini rivals GPT-3.5 on benchmarks
phi-3.5-MoE outperforms similar open-source models
phi-3.5-Vision handles multi-image and text prompts effectively
Abstract
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/Phi-3.5-mini-instructmodel· 919k dl· ♡ 966919k dl♡ 966
- 🤗microsoft/Phi-mini-MoE-instructmodel· 100k dl· ♡ 32100k dl♡ 32
- 🤗askalgore/Phi-3.5-mini-instruct-hereticmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗Jaward/phi-3-mini-4k-instruct.Q4_0.ggufmodel· 402 dl· ♡ 3402 dl♡ 3
- 🤗NexaAI/octo-netmodel· 36 dl· ♡ 14436 dl♡ 144
- 🤗cortexso/phi3model· 6 dl· ♡ 16 dl♡ 1
- 🤗lmstudio-community/Phi-3.1-mini-4k-instruct-GGUFmodel· 2.4k dl· ♡ 232.4k dl♡ 23
- 🤗lmstudio-community/Phi-3.1-mini-128k-instruct-GGUFmodel· 1.7k dl· ♡ 71.7k dl♡ 7
- 🤗microsoft/Phi-3.5-vision-instructmodel· 1.3M dl· ♡ 7281.3M dl♡ 728
- 🤗microsoft/Phi-3.5-MoE-instructmodel· 94k dl· ♡ 57194k dl♡ 571
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · LLaMA · Mixture of Experts · Byte Pair Encoding · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections · Residual Connection · Softmax
