Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To   Benchmark

Pritish Sahu; Karan Sikka; Ajay Divakaran

arXiv:2110.11899·cs.CV·October 25, 2021

Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To Benchmark

Pritish Sahu, Karan Sikka, Ajay Divakaran

PDF

Open Access

TL;DR

This paper introduces Meta-RecipeQA, a new benchmark for multimodal machine comprehension that systematically addresses dataset biases and evaluates models' generalization abilities across varying difficulty levels.

Contribution

It proposes a framework with control knobs to generate diverse datasets, introduces a hierarchical transformer reasoning network, and provides a comprehensive evaluation of models on the new benchmark.

Findings

01

HTRN outperforms SOTA models by ~18% in Visual Cloze task.

02

Models perform worse on Meta-RecipeQA than RecipeQA, indicating less bias.

03

The benchmark effectively measures generalization capabilities.

Abstract

We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer questions based on given passage (or context), and the context and the questions can be in different modalities. Previous works such as RecipeQA have proposed datasets and cloze-style tasks for evaluation. However, we identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models. These biases makes it easier for a model to overfit by relying on spurious correlations or naive data patterns. We propose a systematic framework to address these biases through three Control-Knobs that enable us to generate a test bed of datasets of progressive difficulty levels. We believe that our benchmark (referred to as Meta-RecipeQA) will provide, for the first time, a fine grained estimate of a model's generalization capabilities. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsTest