TL;DR
This paper introduces a new dataset and a novel method for recognizing fine-grained instrument-tissue interactions in endoscopic videos, enhancing surgical activity understanding with triplet-based modeling.
Contribution
It presents a new laparoscopic dataset, CholecT40, and a triplet recognition approach using Class Activation Guide and 3D Interaction Space for detailed activity analysis.
Findings
The proposed method outperforms baseline models on CholecT40.
The Class Activation Guide effectively guides verb and target recognition.
The 3D Interaction Space captures multiple triplets in the same frame.
Abstract
Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes. Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called Class Activation Guide (CAG), which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D Interaction Space, which captures the associations between the triplet components. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHeatmap · Class activation guide · Bitstamp Customer Care Number +1-833-534-1729 · Convolution
