Revisiting Visual Question Answering Baselines
Allan Jabri, Armand Joulin, Laurens van der Maaten

TL;DR
This paper introduces a simple binary classification model for VQA that predicts correctness of image-question-answer triplets, achieving competitive results and challenging the necessity of complex reasoning mechanisms in current systems.
Contribution
It proposes a straightforward binary classification approach for VQA, demonstrating competitive performance and questioning the reliance on attention and memory mechanisms.
Findings
Our model achieves state-of-the-art on Visual7W Telling.
It performs well on VQA Real Multiple Choice.
Current VQA systems may rely heavily on dataset biases.
Abstract
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support "reasoning". For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
