TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering
Ahmed Akl, Abdelwahed Khamis, Zhe Wang, Ali Cheraghian, Sara Khalifa, Kewen Wang

TL;DR
This paper introduces TPCL, a curriculum learning framework that enhances VQA models' robustness across various data regimes by progressively training based on question type and difficulty, leading to state-of-the-art results.
Contribution
The paper proposes a novel task-progressive curriculum learning method for VQA that improves generalization without data augmentation or explicit debiasing.
Findings
TPCL outperforms existing baselines by over 5% on VQA-CP v2.
TPCL achieves up to 28.5% boost in backbone performance.
TPCL demonstrates strong generalization across IID, OOD, and low-data settings.
Abstract
Visual Question Answering (VQA) systems are notoriously brittle under distribution shifts and data scarcity. While previous solutions-such as ensemble methods and data augmentation-can improve performance in isolation, they fail to generalise well across in-distribution (IID), out-of-distribution (OOD), and low-data settings simultaneously. We argue that this limitation stems from the suboptimal training strategies employed. Specifically, treating all training samples uniformly-without accounting for question difficulty or semantic structure-leaves the models vulnerable to dataset biases. Thus, they struggle to generalise beyond the training distribution. To address this issue, we introduce Task-Progressive Curriculum Learning (TPCL)-a simple, model-agnostic framework that progressively trains VQA models using a curriculum built by jointly considering question type and difficulty.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
