Local Visual Microphones: Improved Sound Extraction from Silent Video
Mohammad Amin Shabani, Laleh Samadfam, Mohammad Amin Sadeghi

TL;DR
This paper introduces an improved method for extracting sound from silent videos by analyzing local vibrations, achieving real-time performance and enabling sound direction estimation.
Contribution
It presents a novel approach that aggregates local vibrations for better sound quality, accounts for sound travel delay, and significantly speeds up sound extraction to real-time.
Findings
Enhanced sound quality over previous methods
Real-time sound extraction at 20KHz video
Effective sound direction estimation
Abstract
Sound waves cause small vibrations in nearby objects. A few techniques exist in the literature that can extract sound from video. In this paper we study local vibration patterns at different image locations. We show that different locations in the image vibrate differently. We carefully aggregate local vibrations and produce a sound quality that improves state-of-the-art. We show that local vibrations could have a time delay because sound waves take time to travel through the air. We use this phenomenon to estimate sound direction. We also present a novel algorithm that speeds up sound extraction by two to three orders of magnitude and reaches real-time performance in a 20KHz video.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
