Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View
Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang

TL;DR
This paper interprets the language prior problem in VQA as a class-imbalance issue and proposes a loss re-scaling method to improve model performance by addressing answer frequency disparities.
Contribution
It introduces a novel class-imbalance perspective for understanding the language prior problem and develops a loss re-scaling technique to mitigate answer frequency bias in VQA.
Findings
Improved accuracy on VQA-CP datasets using the proposed method.
The class imbalance interpretation is valid for other vision tasks.
The approach effectively reduces bias towards frequent answers.
Abstract
Recent studies have pointed out that many well-developed Visual Question Answering (VQA) models are heavily affected by the language prior problem, which refers to making predictions based on the co-occurrence pattern between textual questions and answers instead of reasoning visual contents. To tackle it, most existing methods focus on enhancing visual feature learning to reduce this superficial textual shortcut influence on VQA model decisions. However, limited effort has been devoted to providing an explicit interpretation for its inherent cause. It thus lacks a good guidance for the research community to move forward in a purposeful way, resulting in model construction perplexity in overcoming this non-trivial problem. In this paper, we propose to interpret the language prior problem in VQA from a class-imbalance view. Concretely, we design a novel interpretation scheme whereby the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
