TL;DR
This paper introduces Surgical-VQLA++, a novel adversarial contrastive learning approach with calibrated co-attention for precise, robust surgical visual question answering and localization, improving safety and interpretability in robotic surgery.
Contribution
It proposes a new calibrated co-attention embedding and adversarial contrastive learning strategy for robust, localized surgical VQA, extending existing datasets and demonstrating superior performance.
Findings
Achieves high accuracy in surgical VQA and localization tasks.
Demonstrates robustness against image corruptions.
Extends datasets for surgical visual question answering.
Abstract
Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicate the regions of interest corresponding to the given questions results in incomplete comprehension of the surgical scene. To tackle this, we propose the surgical visual question localized-answering (VQLA) for precise and context-aware responses to specific queries regarding surgical images. Furthermore, to address the strong demand for safety in surgical scenarios and potential corruptions in image acquisition and transmission, we propose a novel approach called Calibrated Co-Attention Gated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning · ALIGN
