Arcee Trinity Large Technical Report
Varun Singh, Lucas Krauss, Sami Jaghouar, Matej Sirovatka, Charles Goddard, Fares Obied, Jack Min Ong, Jannik Straube, Fern, Aria Harley, Conner Stewart, Colin Kealty, Maziyar Panahi, Simon Kirsten, Anushka Deshpande, Anneketh Vij, Arthur Bresnu, Pranav Veldurthi

TL;DR
This paper introduces Arcee Trinity models, a set of large sparse Mixture-of-Experts architectures with innovative training strategies, achieving high performance on extensive token datasets and providing open checkpoints.
Contribution
It presents new large-scale sparse MoE models with advanced architecture and a novel load balancing method, trained on trillions of tokens, and shares their checkpoints.
Findings
Models trained without loss spikes.
Trinity models achieve high parameter efficiency.
Open access to trained model checkpoints.
Abstract
We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 26B total parameters with 3B activated per token. The models' modern architecture includes interleaved local and global attention, gated attention, depth-scaled sandwich norm, and sigmoid routing for Mixture-of-Experts. For Trinity Large, we also introduce a new MoE load balancing strategy titled Soft-clamped Momentum Expert Bias Updates (SMEBU). We train the models using the Muon optimizer. All three models completed training with zero loss spikes. Trinity Nano and Trinity Mini were pre-trained on 10 trillion tokens, and Trinity Large was pre-trained on 17 trillion tokens. The model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗arcee-ai/Trinity-Large-Thinkingmodel· ♡ 38♡ 38
- 🤗arcee-ai/Trinity-Large-Thinking-GGUFmodel· ♡ 6♡ 6
- 🤗arcee-ai/Trinity-Large-Previewmodel· 995 dl· ♡ 169995 dl♡ 169
- 🤗arcee-ai/Trinity-Large-Thinking-W4A16model· ♡ 3♡ 3
- 🤗arcee-ai/Trinity-Large-Thinking-FP8-Blockmodel· ♡ 3♡ 3
- 🤗arcee-ai/Trinity-Large-Preview-NVFP4model· 15 dl· ♡ 115 dl♡ 1
- 🤗arcee-ai/Trinity-Large-Preview-W4A16model· 17k dl· ♡ 617k dl♡ 6
- 🤗zone4007/Trinity-Large-Previewmodel· 387 dl387 dl
- 🤗arcee-ai/Trinity-Large-Preview-FP8-Blockmodel· 162 dl· ♡ 1162 dl♡ 1
- 🤗Ares-Realm-Studios/Trinity-Large-Preview-ARSTESTmodel· 274 dl274 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Chromodynamics and Particle Interactions · Computational Physics and Python Applications · High-Energy Particle Collisions Research
