Trying Bilinear Pooling in Video-QA

Thomas Winterbottom; Sarah Xiao; Alistair McLean; Noura Al Moubayed

arXiv:2012.10285·cs.CV·December 21, 2020·1 cites

Trying Bilinear Pooling in Video-QA

Thomas Winterbottom, Sarah Xiao, Alistair McLean, Noura Al Moubayed

PDF

Open Access

TL;DR

This paper investigates the application of bilinear pooling (BLP) techniques to video question answering (video-QA), finding that simple integration often harms performance and providing insights into the challenges of using BLP in this domain.

Contribution

The study applies BLP methods to various video-QA benchmarks, revealing their limited effectiveness and offering best practices for future application in video-QA tasks.

Findings

01

BLP integration generally harms video-QA performance

02

Theoretical analysis explains challenges of BLP in video-QA

03

Recommendations for effective BLP application in video-QA

Abstract

Bilinear pooling (BLP) refers to a family of operations recently developed for fusing features from different modalities predominantly developed for VQA models. A bilinear (outer-product) expansion is thought to encourage models to learn interactions between two feature spaces and has experimentally outperformed `simpler' vector operations (concatenation and element-wise-addition/multiplication) on VQA benchmarks. Successive BLP techniques have yielded higher performance with lower computational expense and are often implemented alongside attention mechanisms. However, despite significant progress in VQA, BLP methods have not been widely applied to more recently explored video question answering (video-QA) tasks. In this paper, we begin to bridge this research gap by applying BLP techniques to various video-QA benchmarks, namely: TVQA, TGIF-QA, Ego-VQA and MSVD-QA. We share our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling