Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy

Jiaheng Dai; Huanrong Liu; Tailai Zhou; Tongyu Jia; Qin Liu; Yutong Ban; Zeju Li; Yu Gao; Xin Ma; Qingbiao Li

arXiv:2604.09051·cs.CV·April 13, 2026

Fine-Grained Action Segmentation for Renorrhaphy in Robot-Assisted Partial Nephrectomy

Jiaheng Dai, Huanrong Liu, Tailai Zhou, Tongyu Jia, Qin Liu, Yutong Ban, Zeju Li, Yu Gao, Xin Ma, Qingbiao Li

PDF

TL;DR

This paper introduces a benchmark for fine-grained action segmentation during renorrhaphy in robot-assisted partial nephrectomy, comparing four temporal models on clinical videos with detailed evaluation metrics.

Contribution

It defines a new benchmark dataset (SIA-RAPN) with annotations and evaluates multiple models, highlighting DiffAct's superior performance in key metrics.

Findings

01

DiffAct achieves highest F1, frame accuracy, edit score, and frame mAP.

02

MS-TCN++ attains highest balanced accuracy.

03

Benchmark includes cross-domain evaluation on a separate dataset.

Abstract

Fine-grained action segmentation during renorrhaphy in robot-assisted partial nephrectomy requires frame-level recognition of visually similar suturing gestures with variable duration and substantial class imbalance. The SIA-RAPN benchmark defines this problem on 50 clinical videos acquired with the da Vinci Xi system and annotated with 12 frame-level labels. The benchmark compares four temporal models built on I3D features: MS-TCN++, AsFormer, TUT, and DiffAct. Evaluation uses balanced accuracy, edit score, segmental F1 at overlap thresholds of 10, 25, and 50, frame-wise accuracy, and frame-wise mean average precision. In addition to the primary evaluation across five released split configurations on SIA-RAPN, the benchmark reports cross-domain results on a separate single-port RAPN dataset. Across the strongest reported values over those five runs on the primary dataset, DiffAct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.