Apriel-Nemotron-15B-Thinker
Shruthan Radhakrishna, Soham Parikh, Gopal Sarda, Anil Turkkan, Quaizar Vohra, Raymond Li, Dhruv Jhamb, Kelechi Ogueji, Aanjaneya Shukla, Oluwanifemi Bamgbose, Toby Liang, Luke Kumar, Oleksiy Ostapenko, Shiva Krishna Reddy Malay, Aman Tiwari, Tara Bogavelli, Vikas Yadav

TL;DR
Apriel-Nemotron-15B-Thinker is a compact 15-billion parameter language model that achieves competitive performance with larger models while significantly reducing memory and computational requirements.
Contribution
Introduces a new 15-billion parameter LLM with a four-stage training pipeline that matches larger models' performance at half the size.
Findings
Performs on par or better than 32B models across benchmarks.
Reduces memory footprint by 50% compared to similar-sized models.
Maintains high reasoning capabilities with fewer resources.
Abstract
While large language models (LLMs) have achieved remarkable reasoning capabilities across domains like code, math and other enterprise tasks, their significant memory and computational costs often preclude their use in practical enterprise settings. To this end, we introduce Apriel-Nemotron-15B-Thinker, a 15-billion parameter model in the ServiceNow Apriel SLM series that achieves performance against medium sized state-of-the-art models such as o1-mini, QWQ32B, and EXAONE-Deep-32B while maintaining only half the memory footprint of those alternatives. Apriel-Nemotron-15B-Thinker model is trained in a four stage training pipeline including 1) Base Model upscaling, 2) Continual Pre-training 3) Supervised Fine-tuning (SFT) and 4) Reinforcement Learning using GRPO. Comprehensive evaluations across a diverse suite of benchmarks consistently demonstrate that our Apriel-Nemotron-15B-Thinker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ServiceNow-AI/Apriel-Nemotron-15b-Thinkermodel· 156 dl· ♡ 126156 dl♡ 126
- 🤗unsloth/Apriel-1.5-15b-Thinkermodel· 23 dl· ♡ 423 dl♡ 4
- 🤗unsloth/Apriel-1.5-15b-Thinker-GGUFmodel· 1.4k dl· ♡ 471.4k dl♡ 47
- 🤗cyankiwi/Apriel-1.5-15b-Thinker-AWQ-4bitmodel· 39 dl· ♡ 239 dl♡ 2
- 🤗cyankiwi/Apriel-1.5-15b-Thinker-AWQ-8bitmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗MagicalAlchemist/Apriel-Nemotron-15b-Thinker-Magic_decensored_MPOAmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗MagicalAlchemist/Apriel-Nemotron-15b-Thinker-Magic_decensored-v2_MPOAmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗Magic-Decensored/Apriel-Nemotron-15b-Thinker-Magic_decensored-v2_MPOA-GGUFmodel· 187 dl· ♡ 1187 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
