Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference
Ramchand Kumaresan

TL;DR
Orion is an open system that enables direct programming, compilation, and training of large language models on Apple's Neural Engine, overcoming existing framework limitations and achieving significant speedups and stable training on iOS devices.
Contribution
Orion introduces the first open end-to-end system for direct ANE programming, compilation, and stable multi-step training, expanding capabilities beyond CoreML's abstractions.
Findings
Achieves 8.5x faster weight updates during training.
Demonstrates stable training of a 110M-parameter transformer in 22 minutes.
Enables GPT-2 inference at over 170 tokens/sec on M4 Max.
Abstract
Over two billion Apple devices ship with a Neural Processing Unit (NPU) - the Apple Neural Engine (ANE) - yet this accelerator remains largely unused for large language model workloads. CoreML, Apple's public ML framework, imposes opaque abstractions that prevent direct ANE programming and do not support on-device training. We present Orion, to our knowledge the first open end-to-end system that combines direct ANE execution, a compiler pipeline, and stable multi-step training with checkpoint resume in a single native runtime, bypassing CoreML entirely via Apple's private _ANEClient and _ANECompiler APIs. Building on prior characterization work by maderix, we extend public knowledge of ANE constraints to a catalog of 20 restrictions on MIL IR programs, memory layout, compilation limits, and numerical behavior, including 14 previously undocumented constraints discovered during Orion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Machine Learning in Materials Science · Advanced Neural Network Applications
