MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering
Zhifei Li, Yiran Wang, Chenyi Xiong, Yujing Xia, Xiaoju Hou, Yue Zhao, Miao Zhang, Kui Xiao, Bing Yang

TL;DR
MacVQA introduces adaptive memory and noise filtering techniques to improve continual visual question answering, effectively balancing knowledge retention, adaptation, and robust feature representation across multiple tasks.
Contribution
The paper presents a novel framework, MacVQA, that combines adaptive memory allocation and global noise filtering to enhance continual VQA performance.
Findings
Achieves 43.38% average accuracy on standard tasks.
Reduces average forgetting to 2.32%.
Outperforms existing baselines on ten continual VQA tasks.
Abstract
Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has been made in retaining knowledge and adapting to new information in the VQA domain. However, current methods often struggle with balancing knowledge retention, adaptation, and robust feature representation. To address these challenges, we propose a novel framework with adaptive memory allocation and global noise filtering called MacVQA for visual question answering. MacVQA fuses visual and question information while filtering noise to ensure robust representations, and employs prototype-based memory allocation to optimize feature quality and memory usage. These designs enable MacVQA to balance knowledge acquisition, retention, and compositional generalization in continual VQA learning. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
