ACT-Net: Anchor-context Action Detection in Surgery Videos
Luoying Hao, Yan Hu, Wenjun Lin, Qun Wang, Heng Li, Huazhu Fu, Jinming, Duan, and Jiang Liu

TL;DR
This paper introduces ACTNet, a novel surgical action detection network that leverages anchor-context interactions and diffusion models to improve accuracy and confidence estimation in surgical videos.
Contribution
The paper proposes ACTNet with an anchor-context detection module and a class conditional diffusion module, enhancing surgical action detection accuracy and confidence estimation.
Findings
Achieved 4.0% mAP improvement over baseline.
State-of-the-art performance on surgical video dataset.
Effective confidence estimation via diffusion model outputs.
Abstract
Recognition and localization of surgical detailed actions is an essential component of developing a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementing the predictions in clinical applications seriously needs to convey model confidence to earn entrustment, which is unexplored in surgical action prediction. In this paper, to accurately detect fine-grained actions that happen at every moment, we propose an anchor-context action detection network (ACTNet), including an anchor-context detection (ACD) module and a class conditional diffusion (CCD) module, to answer the following questions: 1) where the actions happen; 2) what actions are; 3) how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Medical Imaging and Analysis
MethodsDiffusion
