Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

V\'ictor Gallego

arXiv:2605.09708·cs.LG·May 12, 2026

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

V\'ictor Gallego

PDF

1 Repo

TL;DR

Metal-Sci introduces a benchmark and an automated kernel search method for scientific compute tasks on Apple Silicon, demonstrating significant speedups and a novel validation approach using a held-out scoring function.

Contribution

The paper presents a new benchmark suite and an automated search framework for optimizing scientific kernels on Apple Silicon, incorporating a structural validation method with a held-out scoring function.

Findings

01

Achieved in-distribution speedups up to 10.7× on Apple Silicon.

02

Demonstrated the effectiveness of the held-out scoring function in detecting regressions.

03

Provided open-source code for the benchmark and search framework.

Abstract

We present Metal-Sci, a 10-task benchmark of scientific Apple Silicon Metal compute kernels spanning six optimization regimes (stencils, all-pairs in $n$ -body problems, multi-field Boltzmann, neighbor-list molecular dynamics, multi-kernel PDE, FFT). Each task ships a CPU reference, a roofline-anchored fitness function, and a held-out generalization size. We pair the benchmark with a lightweight harness for automatic kernel search that runtime-compiles each candidate, scores it against the roofline across multiple sizes, and feeds structured compile and per-size correctness diagnostics back to a frozen LLM driving a $(1 + 1)$ evolutionary loop. We report matched single-model sweeps of Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.5 on M1 Pro: in-distribution self-speedups span $1.00 \times$ to $10.7 \times$ . Beyond raw speedup, our central methodological claim is structural: the held-out gate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vicgalle/metal-sci-kernels
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.