Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust   Visual Question-Localized Answering in Robotic Surgery

Long Bai; Guankun Wang; Mobarakol Islam; Lalithkumar Seenivasan; An; Wang; Hongliang Ren

arXiv:2408.04958·cs.CV·September 4, 2024

Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

Long Bai, Guankun Wang, Mobarakol Islam, Lalithkumar Seenivasan, An, Wang, Hongliang Ren

PDF

1 Repo

TL;DR

This paper introduces Surgical-VQLA++, a novel adversarial contrastive learning approach with calibrated co-attention for precise, robust surgical visual question answering and localization, improving safety and interpretability in robotic surgery.

Contribution

It proposes a new calibrated co-attention embedding and adversarial contrastive learning strategy for robust, localized surgical VQA, extending existing datasets and demonstrating superior performance.

Findings

01

Achieves high accuracy in surgical VQA and localization tasks.

02

Demonstrates robustness against image corruptions.

03

Extends datasets for surgical visual question answering.

Abstract

Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicate the regions of interest corresponding to the given questions results in incomplete comprehension of the surgical scene. To tackle this, we propose the surgical visual question localized-answering (VQLA) for precise and context-aware responses to specific queries regarding surgical images. Furthermore, to address the strong demand for safety in surgical scenarios and potential corruptions in image acquisition and transmission, we propose a novel approach called Calibrated Co-Attention Gated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

longbai1006/surgical-vqlaplus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning · ALIGN