Synchformer: Efficient Synchronization from Sparse Cues

Vladimir Iashin; Weidi Xie; Esa Rahtu; Andrew Zisserman

arXiv:2401.16423·cs.CV·January 30, 2024·1 cites

Synchformer: Efficient Synchronization from Sparse Cues

Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

PDF

Open Access 2 Repos

TL;DR

Synchformer introduces an efficient audio-visual synchronization model tailored for in-the-wild videos, leveraging contrastive pre-training to achieve state-of-the-art results in both dense and sparse cue scenarios.

Contribution

It presents a novel synchronization model with a decoupled training approach and extends to large-scale datasets, improving interpretability and adding new capabilities.

Findings

01

Achieves state-of-the-art performance in synchronization tasks.

02

Effective on both dense and sparse cues.

03

Extends to large-scale 'in-the-wild' datasets.

Abstract

Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art performance in both dense and sparse settings. We also extend synchronization model training to AudioSet a million-scale 'in-the-wild' dataset, investigate evidence attribution techniques for interpretability, and explore a new capability for synchronization models: audio-visual synchronizability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Photonic and Optical Devices

MethodsFocus