Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language   Models

Reka Team: Aitor Ormazabal; Che Zheng; Cyprien de Masson d'Autume,; Dani Yogatama; Deyu Fu; Donovan Ong; Eric Chen; Eugenie Lamprecht; Hai Pham,; Isaac Ong; Kaloyan Aleksiev; Lei Li; Matthew Henderson; Max Bain; Mikel; Artetxe; Nishant Relan; Piotr Padlewski; Qi Liu; Ren Chen; Samuel Phua,; Yazheng Yang; Yi Tay; Yuqi Wang; Zhongkai Zhu; Zhihui Xie

arXiv:2404.12387·cs.CL·May 9, 2024·3 cites

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Reka Team: Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume,, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham,, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel, Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen

PDF

Open Access

TL;DR

Reka introduces a series of multimodal language models, including Reka Core, Flash, and Edge, that process text, images, video, and audio, achieving state-of-the-art results and outperforming larger models across various benchmarks.

Contribution

The paper presents Reka's new multimodal models that outperform larger models and set new state-of-the-art results across multiple tasks and benchmarks.

Findings

01

Reka Edge and Flash outperform larger models in their compute class.

02

Reka Core approaches the best frontier models on evaluations and human assessments.

03

Reka Core outperforms GPT-4-0613 on human evaluation and Gemini Ultra on video QA.

Abstract

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training · Network On Network