ZAYA1-8B Technical Report

Robert Washbourne; Rishi Iyer; Tomas Figliolia; Henry Zheng; Ryan Lorig-Roach; Sungyeon Yang; Pritish Yuvraj; Quentin Anthony; Yury Tokpanov; Xiao Yang; Ganesh Nanduru; Stephen Ebert; Praneeth Medepalli; Skyler Szot; Srivatsan Rajagopal; Alex Ong; Bhavana Mehta; Beren Millidge

arXiv:2605.05365·cs.AI·May 8, 2026

ZAYA1-8B Technical Report

Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana Mehta, Beren Millidge

PDF

6 Models

TL;DR

ZAYA1-8B is a 700M-parameter mixture-of-experts model optimized for reasoning tasks, achieving competitive performance on mathematics and coding benchmarks through advanced training and test-time compute techniques.

Contribution

The paper introduces ZAYA1-8B, a reasoning-focused MoE model with novel training, fine-tuning, and test-time aggregation methods, including Markovian RSA, to enhance reasoning performance.

Findings

01

ZAYA1-8B matches or exceeds larger models on math and coding benchmarks.

02

Markovian RSA improves reasoning trace aggregation, boosting test performance.

03

ZAYA1-8B achieves 91.9% on AIME'25 and 89.6% on HMMT'25.

Abstract

We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.