# Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music

**Authors:** Hongju Su, Ke Li, Lan Yang, Honggang Zhang, Yi-Zhe Song

arXiv: 2508.20665 · 2025-08-29

## TL;DR

Amadeus introduces a novel two-level architecture combining autoregressive note sequence modeling with bidirectional attribute diffusion, significantly improving symbolic music generation quality and speed, and enabling fine-grained attribute control.

## Contribution

The paper presents Amadeus, a new framework that models musical attributes as an unordered set using bidirectional diffusion, outperforming existing models and enabling attribute control.

## Key findings

- Outperforms SOTA models across multiple metrics
- Achieves at least 4× speed-up in generation
- Enables training-free attribute control

## Abstract

Existing state-of-the-art symbolic music generation models predominantly adopt autoregressive or hierarchical autoregressive architectures, modelling symbolic music as a sequence of attribute tokens with unidirectional temporal dependencies, under the assumption of a fixed, strict dependency structure among these attributes. However, we observe that using different attributes as the initial token in these models leads to comparable performance. This suggests that the attributes of a musical note are, in essence, a concurrent and unordered set, rather than a temporally dependent sequence. Based on this insight, we introduce Amadeus, a novel symbolic music generation framework. Amadeus adopts a two-level architecture: an autoregressive model for note sequences and a bidirectional discrete diffusion model for attributes. To enhance performance, we propose Music Latent Space Discriminability Enhancement Strategy(MLSDES), incorporating contrastive learning constraints that amplify discriminability of intermediate music representations. The Conditional Information Enhancement Module (CIEM) simultaneously strengthens note latent vector representation via attention mechanisms, enabling more precise note decoding. We conduct extensive experiments on unconditional and text-conditioned generation tasks. Amadeus significantly outperforms SOTA models across multiple metrics while achieving at least 4$\times$ speed-up. Furthermore, we demonstrate training-free, fine-grained note attribute control feasibility using our model. To explore the upper performance bound of the Amadeus architecture, we compile the largest open-source symbolic music dataset to date, AMD (Amadeus MIDI Dataset), supporting both pre-training and fine-tuning.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20665/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20665/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/2508.20665/full.md

---
Source: https://tomesphere.com/paper/2508.20665