# Spatio-Temporal Attention Pooling for Audio Scene Classification

**Authors:** Huy Phan, Oliver Y. Ch\'en, Lam Pham, Philipp Koch and, Maarten De Vos, Ian McLoughlin, Alfred Mertins

arXiv: 1904.03543 · 2019-07-01

## TL;DR

This paper introduces a novel spatio-temporal attention pooling mechanism combined with a convolutional recurrent neural network to improve acoustic scene classification, achieving state-of-the-art results on the LITIS Rouen dataset.

## Contribution

It proposes a new attention pooling layer that effectively captures discriminative patterns in acoustic scenes, enhancing classification performance over existing methods.

## Key findings

- Outperforms baseline CNN models.
- Achieves new state-of-the-art on LITIS Rouen dataset.
- Effective in emphasizing relevant spatio-temporal features.

## Abstract

Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The bidirectional recurrent layers are then able to encode the temporal dynamics of the resulting convolutional features. Afterwards, a two-dimensional attention mask is formed via the outer product of the spatial and temporal attention vectors learned from two designated attention layers to weigh and pool the recurrent output into a final feature vector for classification. The network is trained with between-class examples generated from between-class data augmentation. Experiments demonstrate that the proposed method not only outperforms a strong convolutional neural network baseline but also sets new state-of-the-art performance on the LITIS Rouen dataset.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03543/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1904.03543/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1904.03543/full.md

---
Source: https://tomesphere.com/paper/1904.03543