Tutorial on Answering Questions about Images with Deep Learning
Mateusz Malinowski, Mario Fritz

TL;DR
This tutorial introduces neural network architectures for answering questions about images, leveraging datasets like DAQUAR and VQA, and demonstrates how to achieve competitive performance using LSTM and CNN models.
Contribution
It provides a practical guide to building deep learning models for visual question answering, with adaptable architectures and insights into dataset handling.
Findings
Models achieve competitive performance on DAQUAR and VQA datasets.
Combining LSTM with CNN representations is effective for visual question answering.
Tutorial enables readers to implement and improve deep learning models for this task.
Abstract
Together with the development of more accurate methods in Computer Vision and Natural Language Understanding, holistic architectures that answer on questions about the content of real-world images have emerged. In this tutorial, we build a neural-based approach to answer questions about images. We base our tutorial on two datasets: (mostly on) DAQUAR, and (a bit on) VQA. With small tweaks the models that we present here can achieve a competitive performance on both datasets, in fact, they are among the best methods that use a combination of LSTM with a global, full frame CNN representation of an image. We hope that after reading this tutorial, the reader will be able to use Deep Learning frameworks, such as Keras and introduced Kraino, to build various architectures that will lead to a further performance improvement on this challenging task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
