Rethinking Causality-driven Robot Tool Segmentation with Temporal Constraints
Hao Ding, Jie Ying Wu, Zhaoshuo Li, Mathias Unberath

TL;DR
This paper introduces TC-CaRTS, a novel method that leverages temporal constraints to improve robot tool segmentation in videos, achieving faster convergence and better domain generalization over existing approaches.
Contribution
The paper proposes a new temporal causal model and architecture, TC-CaRTS, with three modules that enhance robot tool segmentation by utilizing temporal information.
Findings
TC-CaRTS converges faster than CaRTS, requiring fewer iterations.
TC-CaRTS achieves equal or better segmentation performance across different domains.
All three proposed modules are effective in improving segmentation results.
Abstract
Purpose: Vision-based robot tool segmentation plays a fundamental role in surgical robots and downstream tasks. CaRTS, based on a complementary causal model, has shown promising performance in unseen counterfactual surgical environments in the presence of smoke, blood, etc. However, CaRTS requires over 30 iterations of optimization to converge for a single image due to limited observability. Method: To address the above limitations, we take temporal relation into consideration and propose a temporal causal model for robot tool segmentation on video sequences. We design an architecture named Temporally Constrained CaRTS (TC-CaRTS). TC-CaRTS has three novel modules to complement CaRTS - temporal optimization pipeline, kinematics correction network, and spatial-temporal regularization. Results: Experiment results show that TC-CaRTS requires much fewer iterations to achieve the same or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Surgical Simulation and Training
MethodsTest · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
