Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience
Wei-Tsung Lu, Meng-Hsuan Wu, Yuh-Ming Chiu, Li Su

TL;DR
This paper introduces a new editing-based evaluation method for music style transfer models, demonstrating that it provides more nuanced insights than traditional listening tests, and presents an improved neural network model for style transfer.
Contribution
It proposes a systematic editing test for evaluating music generation models and introduces a novel style transfer model using Transformer architecture.
Findings
Editing test correlates well with model improvements.
Editing effort reflects quality of generated music.
Insights from editing test surpass listening test results.
Abstract
The subjective evaluation of music generation techniques has been mostly done with questionnaire-based listening tests while ignoring the perspectives from music composition, arrangement, and soundtrack editing. In this paper, we propose an editing test to evaluate users' editing experience of music generation models in a systematic way. To do this, we design a new music style transfer model combining the non-chronological inference architecture, autoregressive models and the Transformer, which serves as an improvement from the baseline model on the same style transfer task. Then, we compare the performance of the two models with a conventional listening test and the proposed editing test, in which the quality of generated samples is assessed by the amount of effort (e.g., the number of required keyboard and mouse actions) spent by users to polish a music clip. Results on two target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Adam
