XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

C\u{a}t\u{a}lina Cangea; Petar Veli\v{c}kovi\'c; Pietro Li\`o

arXiv:1709.00572·stat.ML·September 6, 2023

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

C\u{a}t\u{a}lina Cangea, Petar Veli\v{c}kovi\'c, Pietro Li\`o

PDF

1 Repo

TL;DR

XFlow introduces cross-modal deep neural networks that enable data exchange between audio and visual streams, improving audiovisual classification by exploiting correlations and achieving state-of-the-art results on multiple datasets.

Contribution

The paper presents novel cross-modality dataflow architectures and extends cross-connections to non-compatible data, enhancing multimodal learning capabilities.

Findings

01

Models outperform baselines by up to 11.5%

02

Achieve state-of-the-art results on AVletters, CUAVE, and Digits datasets

03

Learn interpretable features that improve discrimination ability

Abstract

In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for example, correlations in the space or time domain across modalities - but should be wisely exploited in order to benefit from their full predictive potential. We propose two deep learning architectures with multimodal cross-connections that allow for dataflow between several feature extractors (XFlow). Our models derive more interpretable features and achieve better performances than models which do not exchange representations, usefully exploiting correlations between audio and visual data, which have a different dimensionality and are nontrivially exchangeable. Our work improves on existing multimodal deep learning algorithms in two essential ways: (1)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

catalina17/XFlow
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.