Towards Learning Cross-Modal Perception-Trace Models

Achim Rettinger; Viktoria Bogdanova; Philipp Niemann

arXiv:1910.08549·cs.CL·October 22, 2019

Towards Learning Cross-Modal Perception-Trace Models

Achim Rettinger, Viktoria Bogdanova, Philipp Niemann

PDF

Open Access

TL;DR

This paper investigates human perception in multi-modal documents, develops a perception-trace model inspired by eye tracking data, and demonstrates its potential to enhance embedding quality across modalities.

Contribution

It introduces CMPM, a novel perception-trace model based on human eye tracking data, to improve multi-modal embeddings beyond traditional heuristics.

Findings

01

Perception-based models capture multi-modality and layout information.

02

CMPM improves basic skip-gram embeddings.

03

Human-inspired perception models have high potential for embedding enhancement.

Abstract

Representation learning is a key element of state-of-the-art deep learning approaches. It enables to transform raw data into structured vector space embeddings. Such embeddings are able to capture the distributional semantics of their context, e.g. by word windows on natural language sentences, graph walks on knowledge graphs or convolutions on images. So far, this context is manually defined, resulting in heuristics which are solely optimized for computational performance on certain tasks like link-prediction. However, such heuristic models of context are fundamentally different to how humans capture information. For instance, when reading a multi-modal webpage (i) humans do not perceive all parts of a document equally: Some words and parts of images are skipped, others are revisited several times which makes the perception trace highly non-sequential; (ii) humans construct meaning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks