Hadamard Product for Low-rank Bilinear Pooling
Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha,, Byoung-Tak Zhang

TL;DR
This paper introduces a low-rank bilinear pooling method using Hadamard product to create efficient attention mechanisms in multimodal learning, achieving state-of-the-art results in visual question-answering with reduced computational complexity.
Contribution
It proposes a novel low-rank bilinear pooling technique with Hadamard product, improving efficiency and performance over existing compact bilinear pooling methods.
Findings
Outperforms compact bilinear pooling in VQA tasks
Achieves state-of-the-art results on VQA dataset
Offers a more parsimonious and computationally efficient model
Abstract
Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
