Self-Attention for Audio Super-Resolution

Nathana\"el Carraz Rakotonirina

arXiv:2108.11637·cs.SD·August 27, 2021

Self-Attention for Audio Super-Resolution

Nathana\"el Carraz Rakotonirina

PDF

1 Repo

TL;DR

This paper introduces a novel audio super-resolution network that combines convolutional layers with self-attention mechanisms, capturing long-range dependencies more effectively and enabling faster training.

Contribution

It proposes a new architecture integrating self-attention with convolutional networks and introduces AFiLM for improved modulation in audio super-resolution.

Findings

01

Outperforms existing methods on standard benchmarks

02

Enables more parallelization and faster training

03

Effectively models long-range dependencies in audio sequences

Abstract

Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ncarraz/AFILM
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods1x1 Convolution