S2Accompanist: A Semantic-Aware and Structure-Guided Diffusion Model for Music Accompaniment Generation

Huakang Chen; Wenkai Cheng; Guobin Ma; Chunbo Hao; Yuxuan Xia; Mengqi Wei; Zhixian Zhao; Pengcheng Zhu; Hanbing Zhang; Lei Xie

arXiv:2605.17414·eess.AS·May 19, 2026

S2Accompanist: A Semantic-Aware and Structure-Guided Diffusion Model for Music Accompaniment Generation

Huakang Chen, Wenkai Cheng, Guobin Ma, Chunbo Hao, Yuxuan Xia, Mengqi Wei, Zhixian Zhao, Pengcheng Zhu, Hanbing Zhang, Lei Xie

PDF

TL;DR

S2Accompanist is a diffusion model that enhances music accompaniment generation by incorporating semantic awareness and structural guidance, achieving state-of-the-art results with limited data and computational resources.

Contribution

The paper introduces a novel semantic-aware and structure-guided diffusion model with an automated data pipeline and a specialized fine-tuning strategy for improved music accompaniment.

Findings

01

Achieved state-of-the-art performance on the ATTM Grand Challenge benchmark.

02

Secured first place in the Efficiency Track with only 402M parameters.

03

Demonstrated competitive results compared to larger models.

Abstract

High-fidelity text-to-music generation typically relies on massive proprietary datasets and immense computational resources. Existing models often struggle to generate coherent pure musical accompaniments and lack precise, localized semantic control due to their reliance on coarse, track-level annotations. To address these limitations under constrained data and computing resources, we propose S2Accompanist, a Semantic-Aware and Structure-Guided Diffusion Model developed for the ICME2026 ATTM Grand Challenge. Specifically, we design an automated data pipeline comprising structural segmentation, Large Audio-Language Model driven segment-level captioning, and dual-metric quality grading to overcome the absence of localized metadata in raw datasets. Furthermore, we propose a semantic-aware Variational Autoencoder fine-tuning strategy that explicitly distills foundational LeadSheet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.