Phi-3 Technical Report: A Highly Capable Language Model Locally on Your   Phone

Marah Abdin; Jyoti Aneja; Hany Awadalla; Ahmed Awadallah; Ammar Ahmad; Awan; Nguyen Bach; Amit Bahree; Arash Bakhtiari; Jianmin Bao; Harkirat Behl,; Alon Benhaim; Misha Bilenko; Johan Bjorck; S\'ebastien Bubeck; Martin Cai,; Qin Cai; Vishrav Chaudhary; Dong Chen; Dongdong Chen; Weizhu Chen; Yen-Chun; Chen; Yi-Ling Chen; Hao Cheng; Parul Chopra; Xiyang Dai; Matthew Dixon; Ronen; Eldan; Victor Fragoso; Jianfeng Gao; Mei Gao; Min Gao; Amit Garg; Allie Del; Giorno; Abhishek Goswami; Suriya Gunasekar; Emman Haider; Junheng Hao,; Russell J. Hewett; Wenxiang Hu; Jamie Huynh; Dan Iter; Sam Ade Jacobs; Mojan; Javaheripi; Xin Jin; Nikos Karampatziakis; Piero Kauffmann; Mahoud Khademi,; Dongwoo Kim; Young Jin Kim; Lev Kurilenko; James R. Lee; Yin Tat Lee; Yuanzhi; Li; Yunsheng Li; Chen Liang; Lars Liden; Xihui Lin; Zeqi Lin; Ce Liu; Liyuan; Liu; Mengchen Liu; Weishung Liu; Xiaodong Liu; Chong Luo; Piyush Madan; Ali; Mahmoudzadeh; David Majercak; Matt Mazzola; Caio C\'esar Teodoro Mendes,; Arindam Mitra; Hardik Modi; Anh Nguyen; Brandon Norick; Barun Patra; Daniel; Perez-Becker; Thomas Portet; Reid Pryzant; Heyang Qin; Marko Radmilac,; Liliang Ren; Gustavo de Rosa; Corby Rosset; Sambudha Roy; Olatunji Ruwase,; Olli Saarikivi; Amin Saied; Adil Salim; Michael Santacroce; Shital Shah; Ning; Shang; Hiteshi Sharma; Yelong Shen; Swadheen Shukla; Xia Song; Masahiro; Tanaka; Andrea Tupini; Praneetha Vaddamanu; Chunyu Wang; Guanhua Wang; Lijuan; Wang; Shuohang Wang; Xin Wang; Yu Wang; Rachel Ward; Wen Wen; Philipp Witte,; Haiping Wu; Xiaoxia Wu; Michael Wyatt; Bin Xiao; Can Xu; Jiahang Xu; Weijian; Xu; Jilong Xue; Sonali Yadav; Fan Yang; Jianwei Yang; Yifan Yang; Ziyi Yang,; Donghan Yu; Lu Yuan; Chenruidong Zhang; Cyril Zhang; Jianwen Zhang; Li Lyna; Zhang; Yi Zhang; Yue Zhang; Yunan Zhang; Xiren Zhou

arXiv:2404.14219·cs.CL·September 4, 2024·150 cites

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad, Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl,, Alon Benhaim, Misha Bilenko, Johan Bjorck, S\'ebastien Bubeck, Martin Cai,, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen

PDF

Open Access 10 Models 5 Datasets

TL;DR

This paper introduces a series of highly capable, small-scale language models, including phi-3-mini, that achieve performance comparable to larger models and can be deployed on phones, with extensions for multilingual, multimodal, and long-context capabilities.

Contribution

The paper presents new small and medium-sized language models with high performance, scalable training data, and enhanced capabilities for multilingual, multimodal, and long-context tasks.

Findings

01

phi-3-mini rivals GPT-3.5 on benchmarks

02

phi-3.5-MoE outperforms similar open-source models

03

phi-3.5-Vision handles multi-image and text prompts effectively

Abstract

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · LLaMA · Mixture of Experts · Byte Pair Encoding · {Dispute@FaQ-s}How to file a dispute with Expedia? · Dense Connections · Residual Connection · Softmax