VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question   Answering

Marc Bola\~nos; \'Alvaro Peris; Francisco Casacuberta; Petia Radeva

arXiv:1612.03628·cs.CV·December 13, 2016

VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering

Marc Bola\~nos, \'Alvaro Peris, Francisco Casacuberta, Petia Radeva

PDF

1 Repo

TL;DR

VIBIKNet is a novel model that combines kernelized CNNs and LSTMs to efficiently answer questions about images, balancing accuracy and computational resources, validated on the VQA dataset.

Contribution

The paper introduces VIBIKNet, a new model integrating kernelized CNNs and LSTMs for visual question answering, optimizing accuracy and efficiency.

Findings

01

VIBIKNet achieves competitive accuracy on the VQA dataset.

02

It offers a favorable trade-off between speed and memory usage.

03

Outperforms some existing methods in efficiency and performance.

Abstract

In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MarcBS/VIBIKNet
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.