Local deployment of large-scale music AI models on commodity hardware
Xun Zhou, Charlie Ruan, Zihe Zhao, Tianqi Chen, Chris Donahue

TL;DR
This paper introduces MIDInfinite, a web app that enables local generation of complex music using a large AI model on standard hardware, making advanced music AI more accessible to developers.
Contribution
It demonstrates porting a large music transformer model to the MLC framework for efficient inference across multiple runtimes, including browsers.
Findings
Generates 51 notes/sec on a MacBook Pro, faster than real-time for most cases.
Enables endless multi-instrumental MIDI generation in the browser.
Shows potential for broader adoption of music AI models on commodity hardware.
Abstract
We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large language model (LLM) pre-trained on the Lakh MIDI dataset, to the Machine Learning Compilation (MLC) framework. Once the model is ported, MLC facilitates inference on a variety of runtimes including C++, mobile, and the browser. We envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers. As a proof of concept, we build a web application that allows users to generate endless streams of multi-instrumental MIDI in the browser, either from scratch or conditioned on a prompt. On commodity hardware (an M3 Macbook Pro), our demo can generate 51…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention
