Attention as a Perspective for Learning Tempo-invariant Audio Queries

Matthias Dorfer; Jan Haji\v{c} Jr.; Gerhard Widmer

arXiv:1809.05689·cs.SD·September 18, 2018·1 cites

Attention as a Perspective for Learning Tempo-invariant Audio Queries

Matthias Dorfer, Jan Haji\v{c} Jr., Gerhard Widmer

PDF

Open Access

TL;DR

This paper introduces a soft attention mechanism in audio--sheet music retrieval models to improve tempo-invariance, enabling more accurate retrieval across performances with varying tempos.

Contribution

It proposes a novel attention-based approach to address tempo variability in audio query retrieval, enhancing model robustness and performance.

Findings

01

Attention improves retrieval accuracy.

02

Model behavior aligns with musical intuition.

03

Empirical results show performance gains.

Abstract

Current models for audio--sheet music retrieval via multimodal embedding space learning use convolutional neural networks with a fixed-size window for the input audio. Depending on the tempo of a query performance, this window captures more or less musical content, while notehead density in the score is largely tempo-independent. In this work we address this disparity with a soft attention mechanism, which allows the model to encode only those parts of an audio excerpt that are most relevant with respect to efficient query codes. Empirical results on classical piano music indicate that attention is beneficial for retrieval performance, and exhibits intuitively appealing behavior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing