Advancing Surgical VQA with Scene Graph Knowledge
Kun Yuan, Manasi Kattel, Joel L. Lavanchy, Nassir Navab, Vinkle, Srivastav, Nicolas Padoy

TL;DR
This paper introduces a new surgical VQA dataset with scene graph knowledge and a novel model that incorporates geometric scene features, significantly improving answer accuracy for complex surgical questions.
Contribution
The work presents a surgical scene graph-based dataset and a VQA model that effectively integrates scene knowledge, addressing bias and reasoning limitations in existing systems.
Findings
SSG-QA dataset is more diverse and unbiased than existing datasets.
SSG-QA-Net outperforms previous methods across various question types.
Incorporating scene geometry improves VQA accuracy in surgical contexts.
Abstract
Modern operating room is becoming increasingly complex, requiring innovative intra-operative support systems. While the focus of surgical data science has largely been on video analysis, integrating surgical computer vision with language capabilities is emerging as a necessity. Our work aims to advance Visual Question Answering (VQA) in the surgical context with scene graph knowledge, addressing two main challenges in the current surgical VQA systems: removing question-condition bias in the surgical VQA dataset and incorporating scene-aware reasoning in the surgical VQA model design. First, we propose a Surgical Scene Graph-based dataset, SSG-QA, generated by employing segmentation and detection models on publicly available datasets. We build surgical scene graphs using spatial and action information of instruments and anatomies. These graphs are fed into a question engine, generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsFocus
