Instella: Fully Open Language Models with Stellar Performance
Jiang Liu, Jialian Wu, Xiaodong Yu, Yusheng Su, Prakamya Mishra, Gowtham Ramesh, Sudhanshu Ranjan, Chaitanya Manem, Ximeng Sun, Ze Wang, Pratik Prabhanjan Brahma, Zicheng Liu, Emad Barsoum

TL;DR
Instella introduces a fully open, three-billion-parameter language model family that achieves state-of-the-art performance among open models, emphasizing transparency, reproducibility, and versatility across tasks.
Contribution
The paper presents Instella, a fully open-source LLM family trained on openly available data, with specialized variants for long context and mathematical reasoning, setting new benchmarks for open models.
Findings
Achieves state-of-the-art results among fully open models.
Competitiveness with leading open-weight models of similar size.
Demonstrates versatility with specialized variants for long contexts and math tasks.
Abstract
Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet the majority of high-performing models remain closed-source or partially open, limiting transparency and reproducibility. In this work, we introduce Instella, a family of fully open three billion parameter language models trained entirely on openly available data and codebase. Powered by AMD Instinct MI300X GPUs, Instella is developed through large-scale pre-training, general-purpose instruction tuning, and alignment with human preferences. Despite using substantially fewer pre-training tokens than many contemporaries, Instella achieves state-of-the-art results among fully open models and is competitive with leading open-weight models of comparable size. We further release two specialized variants: Instella-Long, capable of handling context lengths up to 128K tokens, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗amd/AMD-OLMomodel· ♡ 83♡ 83
- 🤗amd/AMD-OLMo-1Bmodel· 113 dl· ♡ 25113 dl♡ 25
- 🤗amd/AMD-OLMo-1B-SFTmodel· 168 dl· ♡ 21168 dl♡ 21
- 🤗amd/AMD-OLMo-1B-SFT-DPOmodel· 1.3k dl· ♡ 231.3k dl♡ 23
- 🤗amd/Instella-3B-Stage1model· 26 dl· ♡ 1326 dl♡ 13
- 🤗amd/Instella-3Bmodel· 235 dl· ♡ 40235 dl♡ 40
- 🤗amd/Instella-3B-SFTmodel· 181 dl· ♡ 11181 dl♡ 11
- 🤗amd/Instella-3B-Instructmodel· 224 dl· ♡ 59224 dl♡ 59
- 🤗amd/Instella-3B-Long-Instructmodel· 13 dl· ♡ 513 dl♡ 5
- 🤗amd/Instella-3B-Mathmodel· 26 dl· ♡ 726 dl♡ 7
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Materials Science
