Probing Representations Learned by Multimodal Recurrent and Transformer   Models

Jind\v{r}ich Libovick\'y; Pranava Madhyastha

arXiv:1908.11125·cs.CL·August 30, 2019

Probing Representations Learned by Multimodal Recurrent and Transformer Models

Jind\v{r}ich Libovick\'y, Pranava Madhyastha

PDF

Open Access

TL;DR

This study compares how recurrent and transformer models learn sentence representations from different modalities, revealing that RNNs excel in semantic tasks while Transformers excel in translation quality.

Contribution

It provides a comprehensive analysis of the representational properties of multimodal models trained with various signals, highlighting differences between architectures.

Findings

01

RNN-based models outperform in semantic relevance tasks.

02

Transformers achieve higher quality in machine translation.

03

Visual grounding provides stronger training signals than language modeling.

Abstract

Recent literature shows that large-scale language modeling provides excellent reusable sentence representations with both recurrent and self-attentive architectures. However, there has been less clarity on the commonalities and differences in the representational properties induced by the two architectures. It also has been shown that visual information serves as one of the means for grounding sentence representations. In this paper, we present a meta-study assessing the representational quality of models where the training signal is obtained from different modalities, in particular, language modeling, image features prediction, and both textual and multimodal machine translation. We evaluate textual and visual features of sentence representations obtained using predominant approaches on image retrieval and semantic textual similarity. Our experiments reveal that on moderate-sized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax