HyLoVQA: Dynamic Hypernetwork-Generated Low-Rank Adaptation for Continual Visual Question Answering
Yiran Wang, Chenyi Xiong, Ziyue Qin, Miao Zhang, Kui Xiao, Zhifei Li

TL;DR
HyLoVQA introduces a dynamic hypernetwork approach with a memory bank and alignment loss to improve continual visual question answering by reducing task interference and enhancing adaptation.
Contribution
It proposes a novel method combining a memory bank, hypernetwork-generated LoRA adapters, and an alignment loss for better continual VQA performance.
Findings
Outperforms prior state-of-the-art on VQA v2 and NExT-QA datasets.
Effectively adapts to new tasks and objects with parameter efficiency.
Reduces cross-task interference through semantic alignment.
Abstract
Continual Visual Question Answering (VQA) requires learning from non-stationary streams of visual inputs and questions while preserving past knowledge. Most prior methods adapt by updating a largely shared parameter set. This often leads to cross-level task interference, hindering accurate adaptation to the current task and object. To address this limitation, we propose HyLoVQA. It maintains a drift-resilient memory bank of anchors. The bank stores the content of visual objects and textual tasks, and they are updated using current input features. Conditioned on retrieved anchors, a hypernetwork generates lightweight Low-Rank Adaptation (LoRA) adapters. This ensures parameter efficiency, allowing the model to adapt to each task and object dynamically. Additionally, we formulate an alignment loss that aligns semantic discrepancies in the feature space with functional changes in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
