Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Reka Team: Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume,, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham,, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel, Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen

TL;DR
Reka introduces a series of multimodal language models, including Reka Core, Flash, and Edge, that process text, images, video, and audio, achieving state-of-the-art results and outperforming larger models across various benchmarks.
Contribution
The paper presents Reka's new multimodal models that outperform larger models and set new state-of-the-art results across multiple tasks and benchmarks.
Findings
Reka Edge and Flash outperform larger models in their compute class.
Reka Core approaches the best frontier models on evaluations and human assessments.
Reka Core outperforms GPT-4-0613 on human evaluation and Gemini Ultra on video QA.
Abstract
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training · Network On Network
