HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation

Jian Zhu; Jianwei Cui; Shihao Chen; Yubang Zhang; Cheng Luo

arXiv:2604.09054·cs.SD·April 14, 2026

HAFM: Hierarchical Autoregressive Foundation Model for Music Accompaniment Generation

Jian Zhu, Jianwei Cui, Shihao Chen, Yubang Zhang, Cheng Luo

PDF

1 Repo 1 Models

TL;DR

HAFM is a hierarchical autoregressive model that generates instrumental music to accompany vocals, using novel tokenization and Transformer techniques to produce high-quality, time-aligned audio.

Contribution

The paper introduces a dual-rate tokenization scheme and a three-stage hierarchical architecture for improved music accompaniment generation.

Findings

01

HAFM achieves a Fréchet Audio Distance of 2.08 on MUSDB18.

02

It outperforms retrieval baselines in quality.

03

It matches state-of-the-art systems with fewer parameters.

Abstract

We present HAFM, a system that generates instrumental music audio to accompany input vocals. Given isolated singing voice, HAFM produces a coherent instrumental accompaniment that can be directly mixed with the input to create complete music. We propose three key innovations over prior work: (1) a dual-rate codec tokenization scheme using HuBERT semantic tokens at 50\,Hz for vocals and EnCodec acoustic tokens at 75\,Hz for instrumentals, enabling time-aligned yet rate-independent modeling; (2) a three-stage hierarchical autoregressive architecture (semantic to coarse acoustic to fine acoustic) with interleaved multi-codebook prediction and classifier-free guidance; and (3) modern Transformer design choices including QK-norm, GEGLU activations, RMSNorm, and T5-style relative position bias for improved training stability and sequence generalization. Experiments on MUSDB18 demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HackerHyper/HAFM
github

Models

🤗
zhuqijian/HAFM
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.