Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To Benchmark
Pritish Sahu, Karan Sikka, Ajay Divakaran

TL;DR
This paper introduces Meta-RecipeQA, a new benchmark for multimodal machine comprehension that systematically addresses dataset biases and evaluates models' generalization abilities across varying difficulty levels.
Contribution
It proposes a framework with control knobs to generate diverse datasets, introduces a hierarchical transformer reasoning network, and provides a comprehensive evaluation of models on the new benchmark.
Findings
HTRN outperforms SOTA models by ~18% in Visual Cloze task.
Models perform worse on Meta-RecipeQA than RecipeQA, indicating less bias.
The benchmark effectively measures generalization capabilities.
Abstract
We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer questions based on given passage (or context), and the context and the questions can be in different modalities. Previous works such as RecipeQA have proposed datasets and cloze-style tasks for evaluation. However, we identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models. These biases makes it easier for a model to overfit by relying on spurious correlations or naive data patterns. We propose a systematic framework to address these biases through three Control-Knobs that enable us to generate a test bed of datasets of progressive difficulty levels. We believe that our benchmark (referred to as Meta-RecipeQA) will provide, for the first time, a fine grained estimate of a model's generalization capabilities. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsTest
