Qwen3 Technical Report
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang

TL;DR
Qwen3 is a versatile large language model series that integrates thinking and non-thinking modes, offers adaptive resource allocation, and achieves state-of-the-art results across multiple benchmarks with expanded multilingual support.
Contribution
The paper introduces Qwen3, a unified LLM framework with dynamic mode switching, thinking budget, and reduced resource requirements for smaller models, advancing performance and multilingual capabilities.
Findings
Achieves state-of-the-art results on diverse benchmarks.
Expands multilingual support from 29 to 119 languages.
Demonstrates competitive performance with larger MoE models.
Abstract
In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Qwen/Qwen3-8Bmodel· 9.5M dl· ♡ 10219.5M dl♡ 1021
- 🤗Qwen/Qwen3-VL-8B-Instructmodel· 4.5M dl· ♡ 8474.5M dl♡ 847
- 🤗Qwen/Qwen3-4B-Instruct-2507model· 6.3M dl· ♡ 7906.3M dl♡ 790
- 🤗Qwen/Qwen3-0.6Bmodel· 13.6M dl· ♡ 116513.6M dl♡ 1165
- 🤗unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUFmodel· 132k dl· ♡ 564132k dl♡ 564
- 🤗Qwen/Qwen3-4Bmodel· 7.0M dl· ♡ 5847.0M dl♡ 584
- 🤗Qwen/Qwen3-1.7Bmodel· 7.0M dl· ♡ 4377.0M dl♡ 437
- 🤗Qwen/Qwen3-Coder-30B-A3B-Instructmodel· 1.1M dl· ♡ 9901.1M dl♡ 990
- 🤗Qwen/Qwen3-1.7B-Basemodel· 383k dl· ♡ 70383k dl♡ 70
- 🤗Qwen/Qwen3-30B-A3Bmodel· 1.5M dl· ♡ 8721.5M dl♡ 872
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
MethodsMixture of Experts
