IIU: Independent Inference Units for Knowledge-based Visual Question   Answering

Yili Li; Jing Yu; Keke Gai; Gang Xiong

arXiv:2408.07989·cs.CV·August 16, 2024

IIU: Independent Inference Units for Knowledge-based Visual Question Answering

Yili Li, Jing Yu, Keke Gai, Gang Xiong

PDF

Open Access 1 Repo

TL;DR

This paper introduces Independent Inference Units (IIU), a novel multi-modal reasoning framework for knowledge-based visual question answering that improves interpretability and generalization by decomposing intra-modal information into functionally independent units.

Contribution

The paper proposes a new IIU framework that separates intra-modal clues into independent units with communication and a memory update module, enhancing reasoning interpretability and performance.

Findings

01

Achieves 3% performance improvement over existing models

02

Provides explainable reasoning evidence

03

Surpasses basic pretrained multi-modal models

Abstract

Knowledge-based visual question answering requires external knowledge beyond visible content to answer the question correctly. One limitation of existing methods is that they focus more on modeling the inter-modal and intra-modal correlations, which entangles complex multimodal clues by implicit embeddings and lacks interpretability and generalization ability. The key challenge to solve the above problem is to separate the information and process it separately at the functional level. By reusing each processing unit, the generalization ability of the model to deal with different data can be increased. In this paper, we propose Independent Inference Units (IIU) for fine-grained multi-modal reasoning to decompose intra-modal information by the functionally independent units. Specifically, IIU processes each semantic-specific intra-modal clue by an independent inference unit, which also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lilidamowang/iiu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsFocus