The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes
Redacted by arXiv

TL;DR
This paper provides a comprehensive technical overview of Meta's Llama 4 model family, detailing architecture, training, evaluation, deployment, and licensing to serve as a precise reference for researchers and practitioners.
Contribution
It consolidates detailed technical information about Llama 4's variants, architecture, training methods, benchmark results, deployment constraints, and safeguards, offering a thorough reference document.
Findings
Benchmark results for base and instruction-tuned models
Deployment constraints across different environments
Architectural innovations like routed experts and multimodality
Abstract
This document consolidates publicly reported technical details about Metas Llama 4 model family. It summarizes (i) released variants (Scout and Maverick) and the broader herd context including the previewed Behemoth teacher model, (ii) architectural characteristics beyond a high-level MoE description covering routed/shared-expert structure, early-fusion multimodality, and long-context design elements reported for Scout (iRoPE and length generalization strategies), (iii) training disclosures spanning pre-training, mid-training for long-context extension, and post-training methodology (lightweight SFT, online RL, and lightweight DPO) as described in release materials, (iv) developer-reported benchmark results for both base and instruction-tuned checkpoints, and (v) practical deployment constraints observed across major serving environments, including provider-specific context limits and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Model-Driven Software Engineering Techniques
