DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

Joonhyeok Shin; Jaehoon Kang; Yujun Lee; Hannah Lee; Yejin Lee; Yoonji Park; Kyuhong Shim

arXiv:2604.07895·cs.AI·April 10, 2026

DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

Joonhyeok Shin, Jaehoon Kang, Yujun Lee, Hannah Lee, Yejin Lee, Yoonji Park, Kyuhong Shim

PDF

TL;DR

DialBGM introduces a new benchmark dataset for evaluating dialogue-conditioned background music recommendation models, highlighting current models' limitations in matching human preferences.

Contribution

The paper presents DialBGM, a benchmark dataset of daily dialogues with music preferences, enabling standardized evaluation of BGM recommendation models in conversational contexts.

Findings

01

Current models achieve less than 35% Hit@1 accuracy.

02

DialBGM reveals significant gaps between model predictions and human judgments.

03

Benchmark facilitates development of discourse-aware BGM selection methods.

Abstract

Selecting an appropriate background music (BGM) that supports natural human conversation is a common production step in media and interactive systems. In this paper, we introduce dialogue-conditioned BGM recommendation, where a model should select non-intrusive, fitting music for a multi-turn conversation that often contains no music descriptors. To study this novel problem, we present DialBGM, a benchmark of 1,200 open-domain daily dialogues, each paired with four candidate music clips and annotated with human preference rankings. Rankings are determined by background suitability criteria, including contextual relevance, non-intrusiveness, and consistency. We evaluate a wide range of open-source and proprietary models, including audio-language models and multimodal LLMs, and show that current models fall far short of human judgments; no model exceeds 35% Hit@1 when selecting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.