What is Multimodality?

Letitia Parcalabescu; Nils Trost; Anette Frank

arXiv:2103.06304·cs.AI·August 23, 2021

What is Multimodality?

Letitia Parcalabescu, Nils Trost, Anette Frank

PDF

Open Access 1 Video

TL;DR

This paper critiques outdated definitions of multimodality in machine learning, proposing a new task-relative framework that emphasizes relevant representations for specific tasks to advance language grounding and natural language understanding.

Contribution

It introduces a novel, task-focused definition of multimodality, addressing foundational gaps and guiding future research in multimodal machine learning.

Findings

01

Highlights limitations of existing definitions

02

Proposes a task-relative multimodality framework

03

Aims to improve language grounding and NLU

Abstract

The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the machine learning era. We propose a new task-relative definition of (multi)modality in the context of multimodal machine learning that focuses on representations and information that are relevant for a given machine learning task. With our new definition of multimodality we aim to provide a missing foundation for multimodal research, an important component of language grounding and a crucial milestone towards NLU.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques