The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures
Shih-Lun Wu, Yi-Hsuan Yang

TL;DR
This paper introduces the Jazz Transformer, a neural model for Jazz music generation, and critically analyzes its shortcomings through quantitative measures, highlighting the gap between AI-generated and human compositions.
Contribution
The paper presents the Jazz Transformer model incorporating structural elements and provides a comprehensive quantitative analysis of its musical output.
Findings
Low training loss achieved but listening scores indicate quality gap
Analysis reveals differences in pitch, groove, and structure compared to human music
Identifies key areas for future improvement in AI music generation
Abstract
This paper presents the Jazz Transformer, a generative model that utilizes a neural sequence model called the Transformer-XL for modeling lead sheets of Jazz music. Moreover, the model endeavors to incorporate structural events present in the Weimar Jazz Database (WJazzD) for inducing structures in the generated music. While we are able to reduce the training loss to a low value, our listening test suggests however a clear gap between the average ratings of the generated and real compositions. We therefore go one step further and conduct a series of computational analysis of the generated compositions from different perspectives. This includes analyzing the statistics of the pitch class, grooving, and chord progression, assessing the structureness of the music with the help of the fitness scape plot, and evaluating the model's understanding of Jazz music through a MIREX-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Label Smoothing · Dropout · Adam · Variational Dropout · Multi-Head Attention · *Communicated@Fast*How Do I Communicate to Expedia?
