TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering

Ahmed Akl; Abdelwahed Khamis; Zhe Wang; Ali Cheraghian; Sara Khalifa; Kewen Wang

arXiv:2411.17292·cs.CV·March 24, 2026

TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering

Ahmed Akl, Abdelwahed Khamis, Zhe Wang, Ali Cheraghian, Sara Khalifa, Kewen Wang

PDF

Open Access

TL;DR

This paper introduces TPCL, a curriculum learning framework that enhances VQA models' robustness across various data regimes by progressively training based on question type and difficulty, leading to state-of-the-art results.

Contribution

The paper proposes a novel task-progressive curriculum learning method for VQA that improves generalization without data augmentation or explicit debiasing.

Findings

01

TPCL outperforms existing baselines by over 5% on VQA-CP v2.

02

TPCL achieves up to 28.5% boost in backbone performance.

03

TPCL demonstrates strong generalization across IID, OOD, and low-data settings.

Abstract

Visual Question Answering (VQA) systems are notoriously brittle under distribution shifts and data scarcity. While previous solutions-such as ensemble methods and data augmentation-can improve performance in isolation, they fail to generalise well across in-distribution (IID), out-of-distribution (OOD), and low-data settings simultaneously. We argue that this limitation stems from the suboptimal training strategies employed. Specifically, treating all training samples uniformly-without accounting for question difficulty or semantic structure-leaves the models vulnerable to dataset biases. Thus, they struggle to generalise beyond the training distribution. To address this issue, we introduce Task-Progressive Curriculum Learning (TPCL)-a simple, model-agnostic framework that progressively trains VQA models using a curriculum built by jointly considering question type and difficulty.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications