Improved RAMEN: Towards Domain Generalization for Visual Question   Answering

Bhanuka Manesha Samarasekara Vitharana Gamage; Lim Chern Hong

arXiv:2109.02370·cs.CV·September 7, 2021·1 cites

Improved RAMEN: Towards Domain Generalization for Visual Question Answering

Bhanuka Manesha Samarasekara Vitharana Gamage, Lim Chern Hong

PDF

Open Access 1 Repo

TL;DR

This paper enhances the RAMEN model for Visual Question Answering by introducing vector operation-based fusion and transformer-based aggregation modules, significantly improving domain generalization across multiple datasets.

Contribution

It proposes two novel improvements to RAMEN's architecture, focusing on fusion and aggregation modules, to better generalize across diverse VQA datasets.

Findings

01

Up to five VQA datasets show improved performance.

02

Vector-based fusion strategies enhance feature integration.

03

Transformer-based aggregation improves domain robustness.

Abstract

Currently nearing human-level performance, Visual Question Answering (VQA) is an emerging area in artificial intelligence. Established as a multi-disciplinary field in machine learning, both computer vision and natural language processing communities are working together to achieve state-of-the-art (SOTA) performance. However, there is a gap between the SOTA results and real world applications. This is due to the lack of model generalisation. The RAMEN model \cite{Shrestha2019} aimed to achieve domain generalization by obtaining the highest score across two main types of VQA datasets. This study provides two major improvements to the early/late fusion module and aggregation module of the RAMEN architecture, with the objective of further strengthening domain generalization. Vector operations based fusion strategies are introduced for the fusion module and the transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bhanukamanesha/ramen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning