Qwen2.5 Technical Report
Qwen: An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng,, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian, Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou,, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang

TL;DR
Qwen2.5 is a series of large language models with significant improvements in training data, fine-tuning, and reinforcement learning, achieving top-tier benchmark performance and supporting diverse applications including instruction tuning and specialized models.
Contribution
Introduction of Qwen2.5 models with scaled datasets, advanced post-training techniques, and multiple sizes, including proprietary MoE variants, demonstrating state-of-the-art performance and versatility.
Findings
Qwen2.5-72B-Instruct outperforms many open and proprietary models.
Models demonstrate competitive performance to larger models like Llama-3-405B.
Qwen2.5 variants are effective in training specialized models for math, coding, and multimodal tasks.
Abstract
In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Qwen/QwQ-32Bmodel· 66k dl· ♡ 288466k dl♡ 2884
- 🤗FunAudioLLM/InspireMusic-Basemodel· 20 dl· ♡ 1720 dl♡ 17
- 🤗FunAudioLLM/InspireMusic-1.5Bmodel· 7 dl· ♡ 77 dl♡ 7
- 🤗FunAudioLLM/InspireMusic-1.5B-Longmodel· 10 dl· ♡ 3910 dl♡ 39
- 🤗FunAudioLLM/InspireMusic-1.5B-24kHzmodel· 7 dl· ♡ 77 dl♡ 7
- 🤗FunAudioLLM/InspireMusic-Base-24kHzmodel· 6 dl· ♡ 56 dl♡ 5
- 🤗VoidStare/Qwen2.5-14B-Instruct-1M-EXL2-8.0bpw-h8model
- 🤗unsloth/Qwen2.5-7B-Instruct-1Mmodel· 69 dl· ♡ 269 dl♡ 2
- 🤗unsloth/Qwen2.5-7B-Instruct-1M-unsloth-bnb-4bitmodel· 14k dl· ♡ 314k dl♡ 3
- 🤗unsloth/Qwen2.5-7B-Instruct-1M-bnb-4bitmodel· 85 dl· ♡ 185 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning and Data Classification
MethodsBalanced Selection
