Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN

Anders Elowsson; Anders Friberg

arXiv:1906.07145·cs.SD·June 18, 2019·5 cites

Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN

Anders Elowsson, Anders Friberg

PDF

Open Access

TL;DR

This paper introduces a CNN model that analyzes polyphonic music to predict modality, achieving high accuracy and key invariance by innovative pitch chroma processing and harmony analysis across scales.

Contribution

The paper presents a novel CNN architecture that incorporates key-class invariance through pitch chroma pooling and harmony analysis, improving modality prediction in polyphonic music.

Findings

01

Achieved R2 of about 0.71 in modality prediction

02

Outperformed previous systems and human listeners

03

Demonstrated importance of long-scale pitch processing and pooling

Abstract

This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN layer to compute two pitch chromas focused on different octaves. The following layers perform harmony analysis across chroma and time scales. Through max pooling across pitch, the CNN becomes invariant with regards to the key class (i.e., key disregarding mode) of the music. A multilayer perceptron combines the modality activation output with spectral features for the final prediction. The study uses a dataset of 203 excerpts rated by around 20 listeners each, a small challenging data size requiring a carefully designed parameter sharing. With an R2 of about 0.71, the system clearly outperforms previous systems as well as individual human listeners. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Neuroscience and Music Perception · Music Technology and Sound Studies

MethodsMax Pooling