EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos

Aashish Rai; Srinath Sridhar

arXiv:2407.20592·cs.CV·December 17, 2024

EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos

Aashish Rai, Srinath Sridhar

PDF

Open Access

TL;DR

EgoSonics is a novel method that generates synchronized, semantically meaningful audio for silent egocentric videos, enabling new applications in VR and data augmentation.

Contribution

It introduces a new approach using latent diffusion models and SyncroNet for synchronized audio generation from silent egocentric videos, addressing limitations of prior work.

Findings

01

Outperforms existing methods in audio quality

02

Achieves better synchronization in generated audio

03

Enhances video summarization tasks

Abstract

We introduce EgoSonics, a method to generate semantically meaningful and synchronized audio tracks conditioned on silent egocentric videos. Generating audio for silent egocentric videos could open new applications in virtual reality, assistive technologies, or for augmenting existing datasets. Existing work has been limited to domains like speech, music, or impact sounds and cannot capture the broad range of audio frequencies found in egocentric videos. EgoSonics addresses these limitations by building on the strengths of latent diffusion models for conditioned audio synthesis. We first encode and process paired audio-video data to make them suitable for generation. The encoded data is then used to train a model that can generate an audio track that captures the semantics of the input video. Our proposed SyncroNet builds on top of ControlNet to provide control signals that enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Music Technology and Sound Studies

MethodsDiffusion