Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Jingwei Zuo; Maksim Velikanov; Ilyas Chahed; Younes Belkada; Dhia Eddine Rhayem; Guillaume Kunsch; Hakim Hacid; Hamza Yous; Brahim Farhat; Ibrahim Khadraoui; Mugariya Farooq; Giulia Campesan; Ruxandra Cojocaru; Yasser Djilali; Shi Hu; Iheb Chaabane; Puneesh Khanna; Mohamed El Amine Seddik; Ngoc Dung Huynh; Phuc Le Khac; Leen AlQadi; Billel Mokeddem; Mohamed Chami; Abdalgader Abubaker; Mikhail Lubinets; Kacper Piskorski; Slim Frikha

arXiv:2507.22448·cs.CL·July 31, 2025

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, Younes Belkada, Dhia Eddine Rhayem, Guillaume Kunsch, Hakim Hacid, Hamza Yous, Brahim Farhat, Ibrahim Khadraoui, Mugariya Farooq, Giulia Campesan, Ruxandra Cojocaru, Yasser Djilali, Shi Hu, Iheb Chaabane, Puneesh Khanna

PDF

10 Models

TL;DR

Falcon-H1 introduces a hybrid architecture combining Transformer and State Space Models, achieving state-of-the-art performance and efficiency across various tasks with fewer parameters and data, and supporting extensive context lengths.

Contribution

The paper presents Falcon-H1, a novel hybrid large language model architecture that outperforms larger models in efficiency and performance, with comprehensive evaluations and open-source release.

Findings

01

Falcon-H1-34B matches or surpasses models up to 70B in performance.

02

Smaller Falcon-H1 models rival larger counterparts in their respective scales.

03

Models support up to 256K context tokens and 18 languages.

Abstract

In this report, we introduce Falcon-H1, a new series of large language models (LLMs) featuring hybrid architecture designs optimized for both high performance and efficiency across diverse use cases. Unlike earlier Falcon models built solely on Transformer or Mamba architectures, Falcon-H1 adopts a parallel hybrid approach that combines Transformer-based attention with State Space Models (SSMs), known for superior long-context memory and computational efficiency. We systematically revisited model design, data strategy, and training dynamics, challenging conventional practices in the field. Falcon-H1 is released in multiple configurations, including base and instruction-tuned variants at 0.5B, 1.5B, 1.5B-deep, 3B, 7B, and 34B parameters. Quantized instruction-tuned models are also available, totaling over 30 checkpoints on Hugging Face Hub. Falcon-H1 models demonstrate state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.