# City classification from multiple real-world sound scenes

**Authors:** Helen L. Bear, Toni Heittola, Annamaria Mesaros, Emmanouil Benetos,, Tuomas Virtanen

arXiv: 1905.00979 · 2019-07-30

## TL;DR

This paper explores automatic city classification using sound scenes, demonstrating that a multi-task learning approach improves accuracy to 56%, advancing understanding of complex real-world sound analysis.

## Contribution

It introduces a novel city classification task from sound scenes and proposes a multi-task learning framework that outperforms simpler methods.

## Key findings

- Multi-task learning achieves 56% accuracy.
- Simple CNN baseline achieves 50% accuracy.
- Grouped scene labels improve accuracy to 52%.

## Abstract

The majority of sound scene analysis work focuses on one of two clearly defined tasks: acoustic scene classification or sound event detection. Whilst this separation of tasks is useful for problem definition, they inherently ignore some subtleties of the real-world, in particular how humans vary in how they describe a scene. Some will describe the weather and features within it, others will use a holistic descriptor like `park', and others still will use unique identifiers such as cities or names. In this paper, we undertake the task of automatic city classification to ask whether we can recognize a city from a set of sound scenes? In this problem each city has recordings from multiple scenes. We test a series of methods for this novel task and show that a simple convolutional neural network (CNN) can achieve accuracy of 50%. This is less than the acoustic scene classification task baseline in the DCASE 2018 ASC challenge on the same data. A simple adaptation to the class labels of pairing city labels with grouped scenes, accuracy increases to 52%, closer to the simpler scene classification task. Finally we also formulate the problem in a multi-task learning framework and achieve an accuracy of 56%, outperforming the aforementioned approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00979/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00979/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1905.00979/full.md

---
Source: https://tomesphere.com/paper/1905.00979