Reactor Mk.1 performances: MMLU, HumanEval and BBH test results

TJ Dunham; Henry Syahputra

arXiv:2406.10515·cs.AI·July 29, 2024

Reactor Mk.1 performances: MMLU, HumanEval and BBH test results

TJ Dunham, Henry Syahputra

PDF

Open Access

TL;DR

Reactor Mk.1, a less than 100-billion-parameter large language model utilizing Lychee AI, demonstrates top-tier performance on benchmark tests like MMLU, HumanEval, and BBH, surpassing several prominent models.

Contribution

This paper introduces Reactor Mk.1, showcasing its high efficiency and performance, and provides benchmark results that position it as a leading AI model.

Findings

01

Achieved 92% on MMLU dataset

02

Scored 91% on HumanEval dataset

03

Reached 88% on BBH dataset

Abstract

The paper presents the performance results of Reactor Mk.1, ARCs flagship large language model, through a benchmarking process analysis. The model utilizes the Lychee AI engine and possesses less than 100 billion parameters, resulting in a combination of efficiency and potency. The Reactor Mk.1 outperformed models such as GPT-4o, Claude Opus, and Llama 3, with achieved scores of 92% on the MMLU dataset, 91% on HumanEval dataset, and 88% on BBH dataset. It excels in both managing difficult jobs and reasoning, establishing as a prominent AI solution in the present cutting-edge AI technology.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNuclear reactor physics and engineering · Nuclear and radioactivity studies · Nuclear Engineering Thermal-Hydraulics

MethodsLLaMA