HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy

TL;DR
HecVL introduces a hierarchical video-language pretraining method that leverages multi-level textual supervision to enable zero-shot and cross-procedure surgical phase recognition, enhancing model transferability and reducing annotation needs.
Contribution
The paper proposes a novel hierarchical pretraining framework with a multi-level dataset and contrastive learning, improving zero-shot surgical phase recognition and cross-dataset transferability.
Findings
Enables zero-shot surgical phase recognition without human annotations.
Achieves effective transfer across different surgical procedures and medical centers.
Disentangles hierarchical embeddings to encode both short-term and long-term surgical concepts.
Abstract
Natural language could play an important role in developing generalist surgical models by providing a broad source of supervision from raw texts. This flexible form of supervision can enable the model's transferability across datasets and tasks as natural language can be used to reference learned visual concepts or describe new ones. In this work, we present HecVL, a novel hierarchical video-language pretraining approach for building a generalist surgical model. Specifically, we construct a hierarchical video-text paired dataset by pairing the surgical lecture video with three hierarchical levels of texts: at clip-level, atomic actions using transcribed audio texts; at phase-level, conceptual text summaries; and at video-level, overall abstract text of the surgical procedure. Then, we propose a novel fine-to-coarse contrastive learning framework that learns separate embedding spaces for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsContrastive Learning
