Towards Holistic Surgical Scene Understanding

Natalia Valderrama; Paola Ruiz Puentes; Isabela Hern\'andez; Nicol\'as; Ayobi; Mathilde Verlyk; Jessica Santander; Juan Caicedo; Nicol\'as; Fern\'andez; Pablo Arbel\'aez

arXiv:2212.04582·cs.CV·January 29, 2024

Towards Holistic Surgical Scene Understanding

Natalia Valderrama, Paola Ruiz Puentes, Isabela Hern\'andez, Nicol\'as, Ayobi, Mathilde Verlyk, Jessica Santander, Juan Caicedo, Nicol\'as, Fern\'andez, Pablo Arbel\'aez

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces a comprehensive framework for understanding surgical scenes by creating a new dataset with multi-level annotations and proposing a Transformer-based model that leverages these annotations for improved recognition of surgical phases, steps, instruments, and actions.

Contribution

The paper presents the PSI-AVA dataset with detailed annotations and the TAPIR model that utilizes multi-level annotations for holistic surgical scene understanding, advancing prior work focused on isolated tasks.

Findings

01

TAPIR outperforms existing models on PSI-AVA and other datasets.

02

Multi-level annotations improve recognition accuracy across tasks.

03

The framework encourages integrated surgical scene analysis.

Abstract

Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bcv-uniandes/tapir
pytorchOfficial

Models

🤗
TimJaspersTue/SurgeNetModels
model

Datasets

TimJaspersTue/SurgeNetYoutube
dataset· 212 dl
212 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.