Advancing Surgical VQA with Scene Graph Knowledge

Kun Yuan; Manasi Kattel; Joel L. Lavanchy; Nassir Navab; Vinkle; Srivastav; Nicolas Padoy

arXiv:2312.10251·cs.CV·June 25, 2024·1 cites

Advancing Surgical VQA with Scene Graph Knowledge

Kun Yuan, Manasi Kattel, Joel L. Lavanchy, Nassir Navab, Vinkle, Srivastav, Nicolas Padoy

PDF

Open Access 2 Repos

TL;DR

This paper introduces a new surgical VQA dataset with scene graph knowledge and a novel model that incorporates geometric scene features, significantly improving answer accuracy for complex surgical questions.

Contribution

The work presents a surgical scene graph-based dataset and a VQA model that effectively integrates scene knowledge, addressing bias and reasoning limitations in existing systems.

Findings

01

SSG-QA dataset is more diverse and unbiased than existing datasets.

02

SSG-QA-Net outperforms previous methods across various question types.

03

Incorporating scene geometry improves VQA accuracy in surgical contexts.

Abstract

Modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with language capabilities is emerging as a necessity. Our work aims to advance Visual Question Answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. First, we propose a Surgical Scene Graph-based dataset, SSG-QA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsFocus