PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery
Runlong He, Danyal Z. Khan, Evangelos B. Mazomenos, Hani J. Marcus,, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

TL;DR
This paper introduces PitVQA++, a novel fine-tuning method for vision-language models in surgical VQA, utilizing vector matrix-low-rank adaptation to improve performance and reliability in pituitary surgery applications.
Contribution
It proposes Vector-MoLoRA, an innovative parameter-efficient fine-tuning approach that allocates more parameters to early layers, enhancing surgical VQA performance and mitigating catastrophic forgetting.
Findings
Effective performance improvement on PitVQA++ and EndoVis18-VQA datasets.
Enhanced reliability and uncertainty handling in predictions.
Significant mitigation of catastrophic forgetting.
Abstract
Vision-Language Models (VLMs) in visual question answering (VQA) offer a unique opportunity to enhance intra-operative decision-making, promote intuitive interactions, and significantly advancing surgical education. However, the development of VLMs for surgical VQA is challenging due to limited datasets and the risk of overfitting and catastrophic forgetting during full fine-tuning of pretrained weights. While parameter-efficient techniques like Low-Rank Adaptation (LoRA) and Matrix of Rank Adaptation (MoRA) address adaptation challenges, their uniform parameter distribution overlooks the feature hierarchy in deep networks, where earlier layers, that learn general features, require more parameters than later ones. This work introduces PitVQA++ with an open-ended PitVQA dataset and vector matrix-low-rank adaptation (Vector-MoLoRA), an innovative VLM fine-tuning approach for adapting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dense Connections · Attention Dropout · Discriminative Fine-Tuning · Multi-Head Attention · Adam · Softmax
