NVIDIA Nemotron 3: Efficient and Open Intelligence
NVIDIA: Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi

TL;DR
NVIDIA Nemotron 3 introduces a family of models with advanced reasoning, conversational abilities, and high throughput, utilizing a Mixture-of-Experts architecture and reinforcement learning, with open release plans for all models and tools.
Contribution
The paper presents a new Nemotron 3 model family with innovative LatentMoE and MTP layers, achieving state-of-the-art performance and open access for research and deployment.
Findings
Nano outperforms comparable models in accuracy and cost-efficiency.
Super optimized for high-volume, collaborative tasks like IT automation.
Ultra achieves state-of-the-art accuracy and reasoning capabilities.
Abstract
We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUFmodel· 18k dl· ♡ 10818k dl♡ 108
- 🤗nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4model· 1.4M dl· ♡ 2391.4M dl♡ 239
- 🤗nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16model· 197k dl· ♡ 316197k dl♡ 316
- 🤗nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8model· 1.1M dl· ♡ 2121.1M dl♡ 212
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16model· 47k dl· ♡ 6747k dl♡ 67
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16model· 966k dl· ♡ 701966k dl♡ 701
- 🤗unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUFmodel· 79k dl· ♡ 10179k dl♡ 101
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8model· 1.2M dl· ♡ 3301.2M dl♡ 330
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4model· 514k dl· ♡ 132514k dl♡ 132
- 🤗nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8model· 10k dl· ♡ 1910k dl♡ 19
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
