Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Jingwei Zuo, Maksim Velikanov, Ilyas Chahed, Younes Belkada, Dhia Eddine Rhayem, Guillaume Kunsch, Hakim Hacid, Hamza Yous, Brahim Farhat, Ibrahim Khadraoui, Mugariya Farooq, Giulia Campesan, Ruxandra Cojocaru, Yasser Djilali, Shi Hu, Iheb Chaabane, Puneesh Khanna

TL;DR
Falcon-H1 introduces a hybrid architecture combining Transformer and State Space Models, achieving state-of-the-art performance and efficiency across various tasks with fewer parameters and data, and supporting extensive context lengths.
Contribution
The paper presents Falcon-H1, a novel hybrid large language model architecture that outperforms larger models in efficiency and performance, with comprehensive evaluations and open-source release.
Findings
Falcon-H1-34B matches or surpasses models up to 70B in performance.
Smaller Falcon-H1 models rival larger counterparts in their respective scales.
Models support up to 256K context tokens and 18 languages.
Abstract
In this report, we introduce Falcon-H1, a new series of large language models (LLMs) featuring hybrid architecture designs optimized for both high performance and efficiency across diverse use cases. Unlike earlier Falcon models built solely on Transformer or Mamba architectures, Falcon-H1 adopts a parallel hybrid approach that combines Transformer-based attention with State Space Models (SSMs), known for superior long-context memory and computational efficiency. We systematically revisited model design, data strategy, and training dynamics, challenging conventional practices in the field. Falcon-H1 is released in multiple configurations, including base and instruction-tuned variants at 0.5B, 1.5B, 1.5B-deep, 3B, 7B, and 34B parameters. Quantized instruction-tuned models are also available, totaling over 30 checkpoints on Hugging Face Hub. Falcon-H1 models demonstrate state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tiiuae/Falcon-H1-7B-Basemodel· 5.1k dl· ♡ 105.1k dl♡ 10
- 🤗tiiuae/Falcon-H1-0.5B-Instructmodel· 1.7k dl· ♡ 341.7k dl♡ 34
- 🤗tiiuae/Falcon-H1-0.5B-Basemodel· 36k dl· ♡ 1636k dl♡ 16
- 🤗tiiuae/Falcon-H1-1.5B-Basemodel· 2.1k dl· ♡ 22.1k dl♡ 2
- 🤗tiiuae/Falcon-H1-1.5B-Deep-Basemodel· 4.7k dl· ♡ 64.7k dl♡ 6
- 🤗tiiuae/Falcon-H1-3B-Basemodel· 5.0k dl· ♡ 55.0k dl♡ 5
- 🤗tiiuae/Falcon-H1-34B-Basemodel· 4.5k dl· ♡ 134.5k dl♡ 13
- 🤗tiiuae/Falcon-H1-1.5B-Instructmodel· 1.4k dl· ♡ 171.4k dl♡ 17
- 🤗tiiuae/Falcon-H1-1.5B-Deep-Instructmodel· 1.9k dl· ♡ 361.9k dl♡ 36
- 🤗tiiuae/Falcon-H1-3B-Instructmodel· 1.3k dl· ♡ 141.3k dl♡ 14
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
