Perfect match: Improved cross-modal embeddings for audio-visual   synchronisation

Soo-Whan Chung; Joon Son Chung; Hong-Goo Kang

arXiv:1809.08001·cs.CV·November 5, 2020

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

PDF

TL;DR

This paper introduces a novel multi-way matching strategy for learning cross-modal embeddings that significantly improves audio-visual synchronization and enables effective self-supervised visual speech recognition.

Contribution

It proposes a new multi-way matching approach for cross-modal embedding learning, outperforming existing methods in synchronization tasks and enabling self-supervised visual speech recognition.

Findings

01

Outperforms existing baselines in synchronization accuracy

02

Embeddings enable self-supervised visual speech recognition

03

Performance matches fully-supervised models

Abstract

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization. Here, we set up the problem as one of cross-modal retrieval, where the objective is to find the most relevant audio segment given a short video clip. The method builds on the recent advances in learning representations from cross-modal self-supervision. The main contributions of this paper are as follows: (1) we propose a new learning strategy where the embeddings are learnt via a multi-way matching problem, as opposed to a binary classification (matching or non-matching) problem as proposed by recent papers; (2) we demonstrate that performance of this method far exceeds the existing baselines on the synchronization task; (3) we use the learnt embeddings for visual speech recognition in self-supervision, and show that the performance matches the representations learnt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.