ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection

Yiliang Chen; Zhixi Li; Cheng Xu; Alex Qinyang Liu; Ruize Cui; Xuemiao Xu; Jeremy Yuen-Chun Teoh; Shengfeng He; Jing Qin

arXiv:2506.01130·cs.CV·September 30, 2025

ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection

Yiliang Chen, Zhixi Li, Cheng Xu, Alex Qinyang Liu, Ruize Cui, Xuemiao Xu, Jeremy Yuen-Chun Teoh, Shengfeng He, Jing Qin

PDF

Open Access 1 Datasets 3 Reviews

TL;DR

ProstaTD is a large, multi-institutional dataset with high-precision annotations for surgical triplet detection, enabling the transition from classification to full detection in surgical videos.

Contribution

The paper introduces ProstaTD, the largest annotated dataset for surgical triplet detection with precise spatial and temporal boundaries, developed through rigorous multi-institutional efforts.

Findings

01

Largest dataset with 71,775 frames and 196,490 triplet instances

02

Includes high-precision bounding boxes and temporal boundaries

03

Provides tools for annotation and standardized evaluation

Abstract

Surgical triplet detection is a critical task in surgical video analysis. However, existing datasets like CholecT50 lack precise spatial bounding box annotations, rendering triplet classification at the image level insufficient for practical applications. The inclusion of bounding box annotations is essential to make this task meaningful, as they provide the spatial context necessary for accurate analysis and improved model generalizability. To address these shortcomings, we introduce ProstaTD, a large-scale, multi-institutional dataset for surgical triplet detection, developed from the technically demanding domain of robot-assisted prostatectomy. ProstaTD offers clinically defined temporal boundaries and high-precision bounding box annotations for each structured triplet activity. The dataset comprises 71,775 video frames and 196,490 annotated triplet instances, collected from 21…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

(1)ProstaTD integrates multiple sources (ESAD, PSI, PWH), covering different surgical styles, instrument usage, and rare triplets, increasing clinical coverage and the complexity of real-world surgical scenarios. (2)The authors developed dedicated software (Triplet-LabelMe and SurgLabel) supporting structured triplets, temporal propagation, and batch operations. Both tools are open-source, significantly accelerating surgical video annotation and improving reproducibility. (3)The ivtdmetrics pack

Weaknesses

(1)A large portion of the manuscript is devoted to describing the dataset composition, annotation tools, and triplet statistics, while the experimental contribution is relatively limited. (2)Although the TDnet baseline is proposed, the comparisons are primarily against standard detectors. The paper lacks exploration of different architectures, hyperparameter sensitivity, or deeper ablation studies. Additionally, it does not demonstrate the dataset’s utility on other downstream tasks such as sur

Reviewer 02Rating 4Confidence 4

Strengths

Data contribution: First full‑procedure, multi‑institutional, box‑supervised triplet dataset with standardized temporal boundaries; strong annotation process and κ=0.82. Benchmark breadth: Comprehensive baselines (from SSD/Faster R‑CNN to RT‑DETR/YOLOv10‑12) with accuracy‑speed trade‑offs. Tools & reproducibility: Open annotation apps + evaluation toolkit; 5‑fold protocol. Empirical insight: Higher concurrency/scene density than CholecT50, stressing realistic IVT detection.

Weaknesses

Domain generalization not tested: No leave‑one‑source‑out (e.g., train on PWH+ESAD, test on PSI‑AVA). Report would bolster the “multi‑institutional” claim. Missing target boxes: Only instrument boxes are annotated; lack of target localization limits explicit interaction modeling; consider target boxes/segmentation or relation heads. Method novelty modest: TDnet is effective but incremental; limited exploration of explicit relation modeling or stronger temporal modules.

Reviewer 03Rating 6Confidence 5

Strengths

1. important topic that proposes an extensive multi-center multi-rater dataset driving forward the field of triplet detection, encompassing 89 triplets of 7 instruments, 10 actions and 10 targets on 71k frames 2. The dataset is compared to other triplet datasets, clearly pointing out its benefits. 3. Further the paper implements a benchmark using state-of-the-art network architectures and the novel TDnet.

Weaknesses

1. The paper devotes substantial space to criticizing CholecT50 rather than focusing on the independent contributions of the presented dataset. 2. The description of the TDnet architecture, the experimental design and ablation study are not contained in the paper but only in the appendix. Minor Comments: M1. The highlighting in Appendix Table 10 is inconsistent, as the higher value is not always highlighted in bold. M2. The paper cites 7 arXiv sources. Are peer-reviewed publications for these p

Code & Models

Datasets

yik-leung/ProstaTD
dataset· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProstate Cancer Diagnosis and Treatment