PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual   Question Answering in Pituitary Surgery

Runlong He; Danyal Z. Khan; Evangelos B. Mazomenos; Hani J. Marcus,; Danail Stoyanov; Matthew J. Clarkson; Mobarakol Islam

arXiv:2502.14149·cs.CV·February 21, 2025

PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery

Runlong He, Danyal Z. Khan, Evangelos B. Mazomenos, Hani J. Marcus,, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam

PDF

Open Access 1 Repo

TL;DR

This paper introduces PitVQA++, a novel fine-tuning method for vision-language models in surgical VQA, utilizing vector matrix-low-rank adaptation to improve performance and reliability in pituitary surgery applications.

Contribution

It proposes Vector-MoLoRA, an innovative parameter-efficient fine-tuning approach that allocates more parameters to early layers, enhancing surgical VQA performance and mitigating catastrophic forgetting.

Findings

01

Effective performance improvement on PitVQA++ and EndoVis18-VQA datasets.

02

Enhanced reliability and uncertainty handling in predictions.

03

Significant mitigation of catastrophic forgetting.

Abstract

Vision-Language Models (VLMs) in visual question answering (VQA) offer a unique opportunity to enhance intra-operative decision-making, promote intuitive interactions, and significantly advancing surgical education. However, the development of VLMs for surgical VQA is challenging due to limited datasets and the risk of overfitting and catastrophic forgetting during full fine-tuning of pretrained weights. While parameter-efficient techniques like Low-Rank Adaptation (LoRA) and Matrix of Rank Adaptation (MoRA) address adaptation challenges, their uniform parameter distribution overlooks the feature hierarchy in deep networks, where earlier layers, that learn general features, require more parameters than later ones. This work introduces PitVQA++ with an open-ended PitVQA dataset and vector matrix-low-rank adaptation (Vector-MoLoRA), an innovative VLM fine-tuning approach for adapting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hrl-mike/pitvqa-plus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dense Connections · Attention Dropout · Discriminative Fine-Tuning · Multi-Head Attention · Adam · Softmax