Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

Yoichi Ochiai

arXiv:2605.16259·cs.LG·May 19, 2026

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

Yoichi Ochiai

PDF

TL;DR

This paper systematically optimizes real-time diffusion model inference on Apple M3 Ultra, revealing unique challenges and solutions distinct from CUDA-based platforms, achieving 22.7 FPS for image transformation.

Contribution

It demonstrates that optimization strategies effective on CUDA are not directly applicable to Apple Silicon, providing practical guidelines for diffusion inference on this platform.

Findings

01

Quantization does not speed up inference on Apple Silicon.

02

Parallel inference is ineffective on Apple Silicon's architecture.

03

Combining CoreML conversion with a 3-thread pipeline achieves real-time performance.

Abstract

While real-time image generation using diffusion models has advanced rapidly on NVIDIA GPUs, systematic optimization research on non-CUDA platforms such as Apple Silicon remains extremely limited. In this study, we conducted comprehensive optimization experiments across 10 phases targeting the Apple M3 Ultra (60-core GPU, 512 GB unified memory) with the goal of achieving real-time camera img2img transformation. We explored a wide range of techniques including CoreML conversion, quantization, Token Merging, Neural Engine utilization, compact model exploration, frame interpolation, kNN search-based synthesis, pix2pix-turbo, optical flow frame skipping, and knowledge distillation, quantitatively evaluating the effectiveness of each approach. Ultimately, by combining CoreML conversion of the distillation-specialized model SDXS-512 with a 3-thread camera pipeline, we achieved real-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.