INTELLECT-1 Technical Report
Sami Jaghouar, Jack Min Ong, Manveer Basra, Fares Obeid, Jannik, Straube, Michael Keiblinger, Elie Bakouch, Lucas Atkins, Maziyar Panahi,, Charles Goddard, Max Ryabinin, Johannes Hagemann

TL;DR
This paper presents INTELLECT-1, a 10-billion-parameter language model trained collaboratively across the globe using a novel distributed training framework, demonstrating the viability of decentralized, community-driven large-scale model training.
Contribution
It introduces PRIME, a scalable fault-tolerant distributed training framework, and demonstrates successful training of a large language model in a decentralized, community-driven setting.
Findings
Achieved 83-96% compute utilization during training.
Reduced communication bandwidth by 400x with custom all-reduce.
Trained a 10-billion-parameter model across global nodes.
Abstract
In this report, we introduce INTELLECT-1, the first 10 billion parameter language model collaboratively trained across the globe, demonstrating that large-scale model training is no longer confined to large corporations but can be achieved through a distributed, community-driven approach. INTELLECT-1 was trained on 1 trillion tokens using up to 14 concurrent nodes distributed across 3 continents, with contributions from 30 independent compute providers dynamically joining and leaving the training process, while maintaining 83-96% compute utilization and 36.2-41.4% model FLOPS utilization. We leverage PRIME, our scalable distributed training framework designed for fault-tolerant, high-performance training on unreliable, globally distributed nodes. Key innovations in PRIME include the ElasticDeviceMesh, which manages dynamic global process groups for fault-tolerant communication across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Big Data and Digital Economy
