PerfDojo: Automated ML Library Generation for Heterogeneous Architectures
Andrei Ivanov, Siyuan Shen, Gioele Gottardo, Marcin Chrapek, Afif Boudaoud, Timo Schneider, Luca Benini, Torsten Hoefler

TL;DR
PerfDojo introduces an automated, hardware-agnostic ML library optimization framework using LLMs and RL, enabling performance improvements across diverse architectures without manual tuning.
Contribution
It presents PerfDojo, a novel RL-based environment with a human-readable code representation for automatic, portable ML library optimization across heterogeneous hardware.
Findings
Achieves significant performance gains on CPUs and GPUs
Enables hardware-agnostic optimization without prior hardware knowledge
Facilitates human analysis and RL training through interpretable code transformations
Abstract
The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge. Heterogeneity in instruction sets, specialized kernel requirements for different data types and model features (e.g., sparsity, quantization), and architecture-specific optimizations complicate performance tuning. Manual optimization is resource-intensive, while existing automatic approaches often rely on complex hardware-specific heuristics and uninterpretable intermediate representations, hindering performance portability. We introduce PerfLLM, a novel automatic optimization methodology leveraging Large Language Models (LLMs) and Reinforcement Learning (RL). Central to this is PerfDojo, an environment framing optimization as an RL game using a human-readable, mathematically-inspired code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Neural Network Applications
