Towards Holistic Surgical Scene Understanding
Natalia Valderrama, Paola Ruiz Puentes, Isabela Hern\'andez, Nicol\'as, Ayobi, Mathilde Verlyk, Jessica Santander, Juan Caicedo, Nicol\'as, Fern\'andez, Pablo Arbel\'aez

TL;DR
This paper introduces a comprehensive framework for understanding surgical scenes by creating a new dataset with multi-level annotations and proposing a Transformer-based model that leverages these annotations for improved recognition of surgical phases, steps, instruments, and actions.
Contribution
The paper presents the PSI-AVA dataset with detailed annotations and the TAPIR model that utilizes multi-level annotations for holistic surgical scene understanding, advancing prior work focused on isolated tasks.
Findings
TAPIR outperforms existing models on PSI-AVA and other datasets.
Multi-level annotations improve recognition accuracy across tasks.
The framework encourages integrated surgical scene analysis.
Abstract
Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
