Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP

Tornike Karchkhadze; Shlomo Dubnov

arXiv:2604.07612·cs.SD·April 10, 2026

Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP

Tornike Karchkhadze, Shlomo Dubnov

PDF

TL;DR

This paper introduces a real-time AI music accompaniment system using latent diffusion models integrated with MAX/MSP, enabling live musical interactions with reduced latency and high coherence.

Contribution

It presents a novel framework combining diffusion-based AI models with real-time music tools, achieving low-latency accompaniment generation in a widely-used environment.

Findings

01

Achieved 5.4x reduction in sampling time with consistency distillation.

02

Both models operate in real-time with strong musical coherence.

03

Performance degrades gracefully with increased look-ahead, balancing latency and quality.

Abstract

We present a framework for real-time human-AI musical co-performance, in which a latent diffusion model generates instrumental accompaniment in response to a live stream of context audio. The system combines a MAX/MSP front-end-handling real-time audio input, buffering, and playback-with a Python inference server running the generative model, communicating via OSC/UDP messages. This allows musicians to perform in MAX/MSP - a well-established, real-time capable environment - while interacting with a large-scale Python-based generative model, overcoming the fundamental disconnect between real-time music tools and state-of-the-art AI models. We formulate accompaniment generation as a sliding-window look-ahead protocol, training the model to predict future audio from partial context, where system latency is a critical constraint. To reduce latency, we apply consistency distillation to our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.