Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

Yusong Wu; Stephen Brade; Aleksandra Teng Ma; Tia-Jane Fowler; Enning Yang; Berker Banar; Aaron Courville; Natasha Jaques; and Cheng-Zhi Anna Huang

arXiv:2511.17879·cs.LG·May 12, 2026

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

Yusong Wu, Stephen Brade, Aleksandra Teng Ma, Tia-Jane Fowler, Enning Yang, Berker Banar, Aaron Courville, Natasha Jaques, and Cheng-Zhi Anna Huang

PDF

1 Video

TL;DR

This paper introduces an adversarial training approach to reduce reward hacking in reinforcement learning for live AI-driven music accompaniment, enhancing diversity and responsiveness.

Contribution

It presents a novel adversarial method that co-evolves a discriminator with the policy to prevent collapse to trivial outputs in live music interaction.

Findings

01

Improved output diversity and harmonic coherence in generated music.

02

Enhanced adaptation speed and user agency in live interactions.

03

Quantitative and user study results confirm effectiveness.

Abstract

Most applications of generative AI involve a sequential interaction in which a person inputs a prompt and waits for a response, and where reaction time and adaptivity are not important factors. In contrast, live jamming is a collaborative interaction that requires real-time coordination and adaptation without access to the other player's future moves, while preserving diversity to sustain a creative flow. Reinforcement learning post-training enables effective adaptation through on-policy interaction, yet it often reduces output diversity by exploiting coherence-based rewards. This collapse, known as ``reward hacking'', affects many RL post-training pipelines, but is especially harmful in live jamming, where musical creativity relies on dynamic variation and mutual responsiveness. In this paper, we propose a novel adversarial training method on policy-generated trajectories to mitigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction· slideslive