TL;DR
Metal-Sci introduces a benchmark and an automated kernel search method for scientific compute tasks on Apple Silicon, demonstrating significant speedups and a novel validation approach using a held-out scoring function.
Contribution
The paper presents a new benchmark suite and an automated search framework for optimizing scientific kernels on Apple Silicon, incorporating a structural validation method with a held-out scoring function.
Findings
Achieved in-distribution speedups up to 10.7× on Apple Silicon.
Demonstrated the effectiveness of the held-out scoring function in detecting regressions.
Provided open-source code for the benchmark and search framework.
Abstract
We present Metal-Sci, a 10-task benchmark of scientific Apple Silicon Metal compute kernels spanning six optimization regimes (stencils, all-pairs in -body problems, multi-field Boltzmann, neighbor-list molecular dynamics, multi-kernel PDE, FFT). Each task ships a CPU reference, a roofline-anchored fitness function, and a held-out generalization size. We pair the benchmark with a lightweight harness for automatic kernel search that runtime-compiles each candidate, scores it against the roofline across multiple sizes, and feeds structured compile and per-size correctness diagnostics back to a frozen LLM driving a evolutionary loop. We report matched single-model sweeps of Claude Opus 4.7, Gemini 3.1 Pro, and GPT 5.5 on M1 Pro: in-distribution self-speedups span to . Beyond raw speedup, our central methodological claim is structural: the held-out gate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
