# "Is this an example image?" -- Predicting the Relative Abstractness   Level of Image and Text

**Authors:** Christian Otto, Sebastian Holzki, Ralph Ewerth

arXiv: 1901.07878 · 2019-01-31

## TL;DR

This paper introduces a deep learning method to predict the relative abstractness level between images and text, enhancing understanding of semantic cross-modal relations for improved multimodal search.

## Contribution

It proposes a novel metric and an autoencoder-based deep learning approach to determine whether an image is an abstraction of text or vice versa, with reduced labeled data requirements.

## Key findings

- Feasibility demonstrated on a challenging test set.
- Introduces a new metric for cross-modal abstractness.
- Reduces labeled data needs with autoencoder architecture.

## Abstract

Successful multimodal search and retrieval requires the automatic understanding of semantic cross-modal relations, which, however, is still an open research problem. Previous work has suggested the metrics cross-modal mutual information and semantic correlation to model and predict cross-modal semantic relations of image and text. In this paper, we present an approach to predict the (cross-modal) relative abstractness level of a given image-text pair, that is whether the image is an abstraction of the text or vice versa. For this purpose, we introduce a new metric that captures this specific relationship between image and text at the Abstractness Level (ABS). We present a deep learning approach to predict this metric, which relies on an autoencoder architecture that allows us to significantly reduce the required amount of labeled training data. A comprehensive set of publicly available scientific documents has been gathered. Experimental results on a challenging test set demonstrate the feasibility of the approach.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.07878/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1901.07878/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1901.07878/full.md

---
Source: https://tomesphere.com/paper/1901.07878