# The Sound of Motions

**Authors:** Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

arXiv: 1904.05979 · 2019-04-16

## TL;DR

This paper introduces a novel motion-based audio-visual system called Deep Dense Trajectory (DDT) that improves sound localization and separation by leveraging object motion cues, outperforming appearance-based models especially in complex musical scenarios.

## Contribution

The paper presents a new end-to-end learnable model, DDT, and a curriculum learning scheme that utilize motion cues from videos for sound separation, addressing challenges in separating similar instrument sounds.

## Key findings

- Motion cues improve sound separation performance.
- The system effectively separates sounds from instrument duets.
- Motion-based approach outperforms appearance-based models.

## Abstract

Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.05979/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.05979/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/1904.05979/full.md

---
Source: https://tomesphere.com/paper/1904.05979