Estimating Musical Surprisal in Audio

Mathias Rose Bjare; Giorgia Cantisani; Stefan Lattner; Gerhard Widmer

arXiv:2501.07474·cs.SD·January 14, 2025

Estimating Musical Surprisal in Audio

Mathias Rose Bjare, Giorgia Cantisani, Stefan Lattner, Gerhard Widmer

PDF

1 Repo

TL;DR

This paper extends the concept of musical surprisal estimation from symbolic music to audio by using a Transformer model on compressed audio representations, linking IC to musical features and EEG responses.

Contribution

It introduces a novel audio-based surprisal estimation method using autoregressive Transformers on latent representations, bridging computational models and human perception.

Findings

01

IC decreases with repetitions, indicating learning.

02

Higher IC correlates with later segment types and certain musical features.

03

IC can predict EEG responses to music.

Abstract

In modeling musical surprisal expectancy with computational methods, it has been proposed to use the information content (IC) of one-step predictions from an autoregressive model as a proxy for surprisal in symbolic music. With an appropriately chosen model, the IC of musical events has been shown to correlate with human perception of surprise and complexity aspects, including tonal and rhythmic complexity. This work investigates whether an analogous methodology can be applied to music audio. We train an autoregressive Transformer model to predict compressed latent audio representations of a pretrained autoencoder network. We verify learning effects by estimating the decrease in IC with repetitions. We investigate the mean IC of musical segment types (e.g., A or B) and find that segment types appearing later in a piece have a higher IC than earlier ones on average. We investigate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sonycslparis/audioic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer