Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
NVIDIA: Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi

TL;DR
Nemotron 3 Nano is an advanced mixture-of-experts language model that achieves higher accuracy and throughput with fewer activated parameters, supporting long contexts and demonstrating improved reasoning and agentic capabilities.
Contribution
Introduces Nemotron 3 Nano, a hybrid Mamba-Transformer model with efficient parameter activation, large-scale training, and superior performance on benchmarks and reasoning tasks.
Findings
Achieves up to 3.3x higher inference throughput.
Outperforms previous models in accuracy on benchmarks.
Supports context lengths up to 1 million tokens.
Abstract
We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUFmodel· 18k dl· ♡ 10818k dl♡ 108
- 🤗nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4model· 1.4M dl· ♡ 2391.4M dl♡ 239
- 🤗nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16model· 197k dl· ♡ 316197k dl♡ 316
- 🤗nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8model· 1.1M dl· ♡ 2121.1M dl♡ 212
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16model· 47k dl· ♡ 6747k dl♡ 67
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16model· 966k dl· ♡ 701966k dl♡ 701
- 🤗unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUFmodel· 79k dl· ♡ 10179k dl♡ 101
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8model· 1.2M dl· ♡ 3301.2M dl♡ 330
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4model· 514k dl· ♡ 132514k dl♡ 132
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8model· 10k dl· ♡ 1910k dl♡ 19
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Artificial Intelligence in Healthcare and Education
