ConsisTNet: a spatio-temporal approach for consistent anatomical localization in endoscopic pituitary surgery
Zhehua Mao, Adrito Das, Danyal Z. Khan, Simon C. Williams, John G. Hanrahan, Danail Stoyanov, Hani J. Marcus, Sophia Bano

TL;DR
ConsisTNet improves consistent anatomical localization in pituitary surgery by using spatio-temporal features for more stable and accurate real-time guidance.
Contribution
ConsisTNet introduces a novel spatio-temporal model with semi-supervised pseudo-labeling to enhance prediction consistency in endoscopic surgery.
Findings
ConsisTNet improves segmentation consistency by 4.56 and 9.45% in IoU for two regions.
Landmark detection consistency is enhanced with a 43.86% reduction in mean distance error.
The model achieves 202 FPS with FP16 precision, enabling real-time intraoperative use.
Abstract
Automated localization of critical anatomical structures in endoscopic pituitary surgery is crucial for enhancing patient safety and surgical outcomes. While deep learning models have shown promise in this task, their predictions often suffer from frame-to-frame inconsistency. This study addresses this issue by proposing ConsisTNet, a novel spatio-temporal model designed to improve prediction stability. ConsisTNet leverages spatio-temporal features extracted from consecutive frames to provide both temporally and spatially consistent predictions, addressing the limitations of single-frame approaches. We employ a semi-supervised strategy, utilizing ground-truth label tracking for pseudo-label generation through label propagation. Consistency is assessed by comparing predictions across consecutive frames using predicted label tracking. The model is optimized and accelerated using TensorRT…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —http://dx.doi.org/10.13039/501100000266Engineering and Physical Sciences Research Council
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Pituitary Gland Disorders and Treatments · Colorectal Cancer Screening and Detection
Introduction
The pituitary gland, located at the base of the brain near critical structures such as the optic nerves and carotid arteries, plays a vital role in hormone regulation [1]. Tumors in this gland can disrupt hormone secretion and impair vision by compressing the optic nerves [1]. For symptomatic patients, transsphenoidal surgery is the standard treatment, with the endoscopic transsphenoidal approach (eTSA) increasingly preferred due to its minimally invasive nature. This technique allows tumor removal through the nasal cavity and sphenoid sinus while minimizing damage to surrounding structures. However, accurately identifying safe entry points on the sphenoid bone, particularly in the sella region, remains challenging due to the lack of distinct anatomical features and the tumor’s invisibility (Fig. 1a). Errors during surgery can lead to severe complications, such as vision loss or carotid artery injury [1].
To enhance spatial orientation, optical tracking systems integrated with preoperative 3D imaging are commonly used in eTSA [2]. However, these systems disrupt surgical workflow, requiring surgeons to mentally retain localization information, which increases cognitive load. Additionally, preoperative imaging does not account for intraoperative anatomical shifts, further complicating decision making. These limitations underscore the need for real-time, intraoperative guidance systems that integrate seamlessly into the surgical process.
Recent advancements in computer vision have explored endoscopic video-based techniques to identify the sella and other critical anatomical structures. PAINet [3] was the first to address this, with PitSurgRT [4] improving accuracy and achieving real-time performance. However, these models rely on discrete image frames, ignoring the temporal continuity of video data. This can lead to inconsistent predictions across consecutive frames (Fig. 1b) [5], potentially confusing surgeons and increasing intraoperative risks [6].
To address these challenges, we propose ConsisTNet, a spatio-temporal model designed to ensure consistent anatomical localization during eTSA. ConsisTNet leverages features from consecutive video frames and employs a semi-supervised approach to enforce both spatial and temporal consistency, while based on HRNet [7] and ConvLSTM [8], our model introduces a novel pseudo-label generation method tailored for video data. This method enables the use of temporal information from video sequences, reducing prediction volatility and improving consistency. Unlike prior research [3, 4], which has largely overlooked this issue, our work focuses on enhancing the stability of landmark detection and segmentation in pituitary surgery while maintaining the high accuracy of PitSurgRT. The key contributions of this work are as follows:
- A novel network architecture, ConsisTNet, that integrates spatio-temporal information to reduce prediction volatility during eTSA.
- A pseudo-label generation method based on CoTracker2 [9] for temporal learning and consistency evaluation in the absence of ground-truth data.
- A detailed analysis of the impact of temporal learning on reducing prediction variability in pituitary surgical video sequences.
- A real-time implementation of ConsisTNet to meet the performance requirements of intraoperative guidance. Fig. 1a Temporal sequence of critical anatomical structures during the sellar phase of eTSA. b An example of prediction inconsistency (from PitSurgRT [4]) between video frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t}$$\end{document} , with inconsistent predictions highlighted by white boxes
Related works
Recent deep learning advances have substantially improved computer-assisted interventions [10–12]. For instance, Staartjes et al. [11] employed a U-Net architecture to segment three endonasal anatomical structures in a proof-of-concept eTSA study. Das et al. [12] integrated segmentation and object detection for instrument tracking, subsequently applying this approach for skill assessment in eTSA on a high-fidelity phantom. To jointly identify safe entry zones and surrounding critical structures in eTSA, Das et al. [3] developed PAINet, a UNet++-based model for segmentation and landmark detection. Although PAINet showed promise, it faced challenges in real-time performance and detection accuracy. Mao et al. [4] addressed these limitations by introducing PitSurgRT, which integrates HRNet [7] to improve segmentation and landmark detection while achieving real-time inference. However, PitSurgRT processes video frames independently, ignoring the temporal continuity of video data and causing prediction inconsistencies (Fig. 1b).
To improve prediction consistency, researchers have incorporated temporal information into their models. Zhao et al. [13] proposed SSTAN, a semi-supervised spatio-temporal attention network that fuses a vision transformer-based attention mechanism with U-Net to segment polyps in video. Wang et al. [14] enhanced consistency by aggregating spatio-temporal features through ConvLSTM [8]. Furthermore, the multi-stage temporal convolutional network (MSTCN) [15] has been successfully employed to capture temporal patterns in video data.
Point-tracking methods have also leveraged temporal information. Teed and Deng [16] developed RAFT, which refines dense correspondences iteratively for accurate optical flow estimation. Doersch et al. [17] presented TAPIR, a method that tracks arbitrary points through per-frame initialization and temporal refinement, effectively handling occlusions and large displacements. Recently, Karaev et al. [9] introduced CoTracker2, a transformer-based model for multi-point tracking capable of preserving spatial relationships across frames, even under occlusions.
Building on these advancements, ConsisTNet combines spatio-temporal learning with semi-supervised techniques to improve real-time prediction stability in eTSA, specifically addressing the temporal consistency issues observed in previous works.Fig. 2. ConsisTNet architecture includes: a HRNet backbone; a temporal module; and dual heads. CoTracker2-generated pseudo-labels are utilized during training to enhance spatio-temporal learning
Method
Overview of ConsisTNet
As illustrated in Fig. 2, the proposed ConsisTNet combines HRNet for extracting high-resolution spatial features, ConvLSTM for capturing temporal features, and dual heads to produce temporally consistent segmentation masks and landmark coordinates, localizing the safe entry zone and critical anatomical structures, respectively. To enhance temporal granularity, CoTracker2-generated pseudo-labels are incorporated during training for loss calculations, propagating real labels to adjacent frames, and addressing the challenge of limited labeled video data. These strategies improve temporal consistency while preserving the high accuracy of PitSurgRT.
During training, each input sample comprises three consecutive frames, with real labels provided for the last frame. Two branches process the input sample: One uses CoTracker2 to generate pseudo-labels for the first two frames, while the other passes the sample through ConsisTNet to generate logits for loss computation. In inference, only ConsisTNet is used, predicting all three frames of each sample simultaneously.
HRNet as the spatial module
HRNet’s ability to maintain both high- and low-resolution representations [7] has proven highly effective in previous work [4] for eTSA images, outperforming networks such as UNet++, DeepLabv3+, and PSPNet [3]. ConsisTNet employs HRNet as its backbone. The input image is downsampled to 1/4 of its original size via two \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3 \times 3$$\end{document} convolutional layers with a stride of 2. HRNet consists of four stages, each with four parallel branches that maintain feature maps at resolutions of 1/4, 1/8, 1/16, and 1/32. The first stage includes 4 residual units, while subsequent stages add branches, doubling feature map width at each stage transition. By the final stage, the feature map widths are 48, 96, 192, and 384 channels. High- and low-resolution feature maps are fused at each stage, ensuring effective multi-scale information exchange. After the final stage, low-resolution feature maps are upsampled and fused with high-resolution ones, resulting in a final high-resolution feature map with 720 channels.
Temporal module
ConvLSTM is integrated into ConsisTNet to capture temporal features in eTSA videos while preserving spatial details, making it well-suited for both segmentation and landmark detection tasks. The temporal module operates on HRNet-extracted feature maps using two ConvLSTM layers, each with 720 input and hidden channels, preserving high-resolution features for accurate localization. Both layers use a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3 \times 3$$\end{document} kernel and a stride of 1 to capture spatial details while learning temporal dependencies. The two-layer module processes sequences of feature maps, updating hidden and cell states to learn short- and long-term dependencies, capturing hierarchical temporal features.
ConvLSTM cells employ four gates—input, forget, output, and cell state—to control information flow, selectively retaining or discarding data to maintain relevant features [8]. This enhances prediction consistency, reducing fluctuations between frames and leading to smoother, more stable predictions, especially during gradual surgical view changes. ConvLSTM’s ability to preserve spatial–temporal coherence while maintaining real-time performance makes it ideal for improving ConsisTNet’s stability.
Dual-head outputs
Following the temporal module, two heads are connected: one for segmenting the sella and clival recess and another for detecting landmarks. The segmentation head uses a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 1$$\end{document} convolutional layer with stride 1, followed by batch normalization, ReLU activation, and a final \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 1$$\end{document} convolutional layer with argmax activation and upsampling to recover the original resolution. For landmark detection, feature maps pass through a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 1$$\end{document} convolutional layer with stride 1, average pooling, ReLU activation, and fully connected layers to output the coordinates of four landmarks.Fig. 3. CoTracker2 is utilized for: a pseudo-label generation at frames \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-2}$$\end{document} and b consistency evaluation. The arrows indicate the propagation of annotations in a and predictions in b across consecutive frames using CoTracker2
Tracking-based pseudo-label generation
Manual video annotation is often sparse due to the high workload, typically at lower frame rates, which disrupts temporal coherence between frames and limits the network’s ability to learn frame-to-frame consistency. To mitigate this, we use CoTracker2 to generate pseudo-labels for unannotated frames, enhancing temporal continuity in the training data. CoTracker2 can handle occlusions caused by surgical instruments in eTSA, making it ideal for this task. As shown in Fig. 3a, CoTracker2 can propagate real labels (landmarks and segmentation masks) from the annotated frame to unannotated frames to generate pseudo-labels (e.g., from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_t$$\end{document} to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-2}$$\end{document} in Fig. 3a). Since CoTracker2 was originally developed for point tracking, for mask propagation, we sample them into sparse feature points, track them across frames, and reconstruct the mask using concave hull geometry. This method ensures accurate pseudo-labels for both landmarks and masks, maintaining temporal coherence.
Loss function
In our previous work [4], we demonstrated the effectiveness of using Dice loss for segmentation, as well as Wing loss and Focal loss (FL) for landmark detection. In this paper, we extend the loss function by incorporating PyTorch’s smooth L1 loss for both tasks. This extension is motivated by the minimal movement observed during the sellar phase of surgery, where the endoscope remains largely stationary. As a result, predictions for segmentation and landmark detection should exhibit temporal consistency across frames. Unlike Dice, Wing, and Focal losses, which measure differences between predictions and ground-truth/pseudo-labels, smooth L1 loss promotes temporal coherence by computing differences between predictions of consecutive frames. This ensures consistency over time. The overall loss function is defined as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{Loss}= & \big (w_{\textrm{1}} \cdot \textrm{Dice} + w_{\textrm{2}} \cdot \textrm{L1}_{\textrm{seg}}\big ) \nonumber \\ & + \big (w_{\textrm{3}} \cdot \textrm{Wing} + w_{\textrm{4}} \cdot \textrm{FL} + w_{\textrm{5}} \cdot \textrm{L1}_{\textrm{ldmk}}\big ), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{L1}_{\textrm{seg}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{L1}_{\textrm{ldmk}}$$\end{document} are smooth L1 losses for the segmentation and landmark detection tasks, respectively. The weights \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{1}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{2}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{3}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{4}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{5}$$\end{document} are hyperparameters that balance the contributions of the individual loss components.
Experimental setup
Dataset
The dataset consists of 635 manually annotated frames, created by four neurosurgeons, extracted from 64 complete eTSA videos. The videos have a mean duration of 74 min, and each video is divided into four distinct surgical phases [4]. Our work focuses on the sellar phase, which is critical for identifying key anatomical structures. For each video, a 10-second segment preceding sellotomy was manually annotated, with frames initially sampled at 1 FPS. Additional details about the dataset setup can be found in our previous works [3, 4]. Due to the inherent difficulty of labeling invisible features on sphenoidal bones, some anatomical structures remain partially annotated. The sella and clival recess are annotated in all frames, while other structures are labeled in up to 65% of the frames. Our primary focus is on the sella region, where the tumor is located posteriorly. While the clival recess is typically visible during the sellar phase, it is clinically less significant. Nevertheless, it can serve as a reference to help the model identify the locations of other anatomies.
To address the challenge of sparse annotations and to enable the model to learn spatio-temporal continuity, we resampled the 10-second clips at 3 FPS for training. This frame rate balances temporal continuity and computational efficiency. The manually annotated frames serve as ground truth, while pseudo-labels are generated for the additional frames using tracking methods, as detailed in Sect. Tracking-based pseudo-label generation. This process augments the dataset with 1,270 pseudo-labeled frames, increasing the total number of annotated frames to 1,905. For evaluation, we adopt a fivefold cross-validation strategy consistent with prior works [4], ensuring that all frames from the same patient are assigned to the same fold. During inference, the same 3 FPS sampling rate is applied to maintain consistency with the training setup.
Evaluation metric
Given the relatively small size of our dataset (64 surgical videos), to maximize the data available for training while maintaining rigorous validation, we evaluate the model in two aspects by using fivefold cross-validation [18]: prediction performance and consistency.
Performance evaluation: Performance is evaluated by comparing predictions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t}^p$$\end{document} with ground truth \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t}^g$$\end{document} at frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t}$$\end{document} . Metrics include Intersection over Union (IoU) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hbox {F}_{1}\hbox {-score}}$$\end{document} score for segmentation, as well as mean distance error (mDistance) and the percentage of correct keypoints (MPCK5, MPCK10, MPCK20) for landmarks. MPCK10, for instance, denotes the percentage of landmarks within 10% of the image height (72 pixels) from the ground truth. Our prior clinical study [4] validated MPCK20 as a reliable and clinically meaningful criterion for selecting models that are sufficiently accurate for surgical guidance.Table 1. Performance and consistency evaluation (mean ± std, fivefold cross-validation)MethodSellaClival recessSellaClival recessLandmarksIoU (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hbox {F}_{1}\hbox {-score}}$$\end{document} (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK5 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK10 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK20 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} mDistance (pixel) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document} Performance evaluationPitSurgRT [4] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$66.59\pm 2.29$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{45.81\pm 7.19}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$79.92\pm 1.66$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{62.51\pm 6.70}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${26.08\pm 7.67}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$68.88\pm 8.55$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.75\pm 4.79$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${63.05\pm 10.20}$$\end{document} PAINet [3] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$64.31\pm 2.85$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$45.32\pm 9.17$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78.24\pm 2.11$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$61.79\pm 9.28$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7.92\pm 1.71$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$34.38\pm 7.47$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$83.31\pm 6.83$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$97.12\pm 11.06$$\end{document} MSTCN [15] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$64.63\pm 4.47$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$41.73\pm 8.61$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$78.43\pm 3.28$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$58.36\pm 8.80$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$13.72\pm 11.27$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$41.05\pm 13.85$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$85.39\pm 7.84$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$89.76\pm 18.38$$\end{document} SSTAN [13] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$41.52\pm 3.67$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$23.69\pm 4.63$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$58.58\pm 3.61$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38.07\pm 6.35$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$13.96\pm 6.43$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$43.49\pm 13.78$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$86.24\pm 9.82$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$88.58\pm 20.20$$\end{document} ConsisTNet \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{66.84\pm 3.31}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$45.08\pm 7.43$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{80.62\pm 2.29}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$60.24\pm 8.21$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{34.05\pm 6.54}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{74.74\pm 8.97}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{96.34\pm 4.94}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{55.10\pm 9.77}$$\end{document} Consistency EvaluationPitSurgRT [4] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.85\pm 1.13$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$67.41\pm 4.87$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$90.62\pm 0.68$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$80.43\pm 3.58$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$94.14\pm 2.68$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$98.69\pm 1.01$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.93\pm 0.13$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$12.79\pm 2.47$$\end{document} PAINet [3] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$81.29\pm 1.83$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$68.23\pm 2.90$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$89.67\pm 1.12$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$81.08\pm 2.03$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$82.70\pm 2.64$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$97.85\pm 0.69$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.96\pm 0.05$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$23.49\pm 1.53$$\end{document} MSTCN [15] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$80.72\pm 2.51$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$66.83\pm 5.46$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$89.31\pm 1.55$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$79.98\pm 4.10$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{97.92\pm 0.53}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.54\pm 0.30$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.98\pm 0.04$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.28\pm 0.53$$\end{document} SSTAN [13] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$62.71\pm 4.39$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$40.71\pm 3.57$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$76.99\pm 3.27$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$57.77\pm 3.61$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$98.09\pm 0.48$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.49\pm 0.31$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$99.93\pm 0.05$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{4.24\pm 0.60}$$\end{document} ConsisTNet \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{86.63\pm 1.68}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{73.78\pm 3.38}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{92.83\pm 0.97}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{84.87\pm 2.24}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$97.92\pm 2.02$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{99.70\pm 0.18}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{99.98\pm 0.04}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7.18\pm 1.75$$\end{document} Bolded values represent the highest performance for each metric in the respective column
Consistency evaluation: Inspired by Varghese et al. [19], temporal consistency is evaluated using CoTracker2. As shown in Fig. 3b, the model’s segmentation and landmark predictions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t-1}^p$$\end{document} are propagated from frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t-1}$$\end{document} to its consecutive frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t}$$\end{document} using CoTracker2, resulting in a tracked prediction \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{I}_{t}^p$$\end{document} . Comparing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{I}_{t}^p$$\end{document} with model predictions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t}^p$$\end{document} at frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${I}_{t}$$\end{document} , we calculate the model’s temporal consistency. Metrics such as IoU, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hbox {F}_{1}\hbox {-score}}$$\end{document} , mDistance, and MPCK are used for this evaluation.
Training details
Training and validation are performed on a workstation with an NVIDIA RTX A6000 GPU (48 GB). The HRNet backbone, using pre-trained weights from PitSurgRT [4], remains frozen while training other components of ConsisTNet. Optimization uses SGD with a 0.9 momentum, and the learning rate starts at 0.01, decaying linearly to 0.0001 over 200 epochs. For the first 50 epochs, only the temporal module and segmentation head are trained, with the landmark detection head frozen. In the remaining 150 epochs, all components except for HRNet are trained jointly. The weights for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{1}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{2}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{3}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{4}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{5}$$\end{document} are set to 0.9, 0.1, 0.8, 0.2, and 0.005, respectively.
The proposed method is implemented in PyTorch 1.13.1, using Python 3.8.18 and CUDA 11.5. The code is available at https://github.com/ZH-Mao/PitVideo.git. To achieve real-time performance, we optimize ConsisTNet on NVIDIA GPUs using the TensorRT1 technique with 32-bit floating point (FP32) and FP16.
Results and discussion
Comparison with state-of-the-art methods
Quantitative analysis: We evaluate the performance of the proposed ConsisTNet against two image-based models, PitSurgRT [4] and PAINet [3], as well as two video-based models, MSTCN [15] and SSTAN [13]. All models, except PAINet, use the HRNet backbone with identical weights; PAINet is based on a pretrained EfficientNetB3 backbone [3]. While PitSurgRT and PAINet are trained solely with real labels, the video-based models (MSTCN, SSTAN, and ConsisTNet) utilize both real and pseudo-labels. The comparative results based on fivefold cross-validation are presented in Table 1.
As shown in the upper portion of Table 1, our proposed ConsisTNet achieves segmentation and landmark detection accuracies comparable to those of PitSurgRT. Moreover, it outperforms other image-based and video-based methods, including PAINet, MSTCN, and SSTAN, in both accuracy and consistency metrics. Regarding the consistency evaluation, the lower part of Table 1 illustrates a significant improvement in prediction stability across consecutive video frames. Specifically, ConsisTNet increases segmentation consistency by 4.56% for the sella and 9.45% for the clival recess compared to PitSurgRT in IoU. In terms of landmark detection, ConsisTNet markedly enhances consistency, reducing the mean distance error from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$12.79 \pm 2.47$$\end{document} pixels to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7.18 \pm 1.75$$\end{document} pixels—a relative improvement of 43.86%.Fig. 4. Qualitative comparison of prediction performance and consistency between the proposed and compared methods. The IoU for the focused area, sella, is displayed in the bottom right of each imageTable 2Ablation study for assessing the accuracy of the CoTracker2 compared with RAFT and TAPIR (single fold)MethodPerformance evaluationSellaClival recessSellaClival recessLandmarksIoU (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hbox {F}_{1}\hbox {-score}}$$\end{document} (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK5 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK10 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK20 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} mDistance (pixel) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document} RAFT [16]86.5970.0992.8182.4196.3899.2699.596.93TAPIR [17]55.4634.6671.3451.4723.3793.03100.0046.01CoTracker2 [9] \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{89.23}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{74.11}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{94.31}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{85.12}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{99.33}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{100.00}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{100.00}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{4.71}$$\end{document} Bolded values represent the highest performance for each metric in the respective columnTable 3Ablation study for assessing the effectiveness of the CoTracker2 module, temporal module, and consistency loss (single fold)MethodSellaClival recessSellaClival recessLandmarksIoU (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hbox {F}_{1}\hbox {-score}}$$\end{document} (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK5 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK10 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK20 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} mDistance (pixel) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document} Performance EvaluationPitSurgRT69.98 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{53.31}$$\end{document} 82.34 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{69.55}$$\end{document} 24.1478.6897.8155.06PitSurgRT+CoTracker263.2245.7577.4762.7814.7361.4496.5566.87PitSurgRT+Cotracker2+ConvLSTM67.7447.1980.7664.1222.8840.4489.9784.60PitSurgRT+Cotracker2+ConvLSTM +Consistency loss \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{71.16}$$\end{document} 52.34 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{83.15}$$\end{document} 68.71 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{32.60}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{84.64}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{100.00}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{48.10}$$\end{document} Consistency EvaluationPitSurgRT82.9068.8390.6581.5495.6598.55100.0011.22PitSurgRT+CoTracker280.8364.6889.4078.5594.8698.46100.0012.89PitSurgRT+Cotracker2+ConvLSTM83.6770.0291.1182.3798.3999.7999.898.33PitSurgRT+Cotracker2+ConvLSTM +Consistency loss \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{86.70}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{73.43}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{92.88}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{84.68}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{99.69}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{99.90}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{100.00}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{5.95}$$\end{document} Bolded values represent the highest performance for each metric in the respective column
Qualitative analysis: Fig. 4 presents qualitative results from four surgeries, comparing prediction accuracy and consistency of segmentation and landmark detection across video frames. A matrix in the bottom right of each image provides quantitative results. ConsisTNet demonstrates similar prediction accuracy to PitSurgRT (first row of matrices), often achieving the best or comparable results in sella segmentation and landmark detection among competing methods. While MSTCN and SSTAN show strong landmark detection consistency with lower mDistance (second row of matrices), they struggle with prediction accuracy and segmentation consistency, e.g., they exhibit significant errors for landmark detection accuracy (over 100 pixels) in the first case and lower segmentation consistency for all four cases. Overall, ConsisTNet outperforms other methods by balancing superior consistency and good prediction accuracy.
Using TensorRT, we accelerated the trained models. Compared to the PyTorch model (inference speed: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10.13 \pm 0.03$$\end{document} FPS), the accelerated models achieved speeds of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$113.10 \pm 8.22$$\end{document} FPS (FP32) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$202.00 \pm 7.09$$\end{document} FPS (FP16), without performance loss.
Ablation studies
Accuracy of pseudo-label generation: Following the pseudo-labeling approach described in Sect. Tracking-based pseudo-label generation, we evaluate the accuracy of CoTracker2 by comparing it with two state-of-the-art methods, RAFT [16] and TAPIR [17]. We propagate annotations from frames with ground-truth labels to other labeled frames and compare the resulting pseudo-labels against the real labels. For point tracking and pseudo-label generation using RAFT, we utilized its optical flow estimates to propagate points across frames. Specifically, (1) for each point in frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_t$$\end{document} , we computed the optical flow to frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-1}$$\end{document} ; (2) the resulting flow vectors were then used to estimate the new position of the point in frame \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{t-1}$$\end{document} ; (3) this process was iteratively repeated for subsequent frames. For the point-tracking method TAPIR, we followed the same pipeline used for CoTracker2 to generate pseudo-labels, as described in Sect. Tracking-based pseudo-label generation.
As shown in Table 2, CoTracker2 consistently outperforms RAFT and TAPIR for both landmarks and masks, demonstrating its robustness and confirming that it is the most suitable method for generating high-quality pseudo-labels in our occlusion-prone surgical videos.
Effectiveness of each module in ConsisTNet: We perform four ablation experiments to assess the effectiveness of the CoTracker2 module, temporal module, and consistency loss: (1) PitSurgRT trained with real labels only; (2) PitSurgRT trained with real and pseudo-labels; (3) PitSurgRT with ConvLSTM (no consistency loss), trained with real and pseudo-labels. (4) PitSurgRT with ConvLSTM and consistency loss, trained with real and pseudo-labels.
The results in Table 3 highlight contributions of each component. Introducing CoTracker2 alone does not boost accuracy or consistency because the sellar phase is relatively stationary and pseudo-labels—generated between consecutive ground-truth labels that are 1 s apart—offer limited additional variation. Moreover, if real labels are obstructed by surgical instruments, it can lead to incomplete or inaccurate pseudo-labels. However, adding the temporal module (ConvLSTM) mitigates these issues by modeling frame-to-frame dependencies. When combined with the consistency loss, our model matches the baseline’s accuracy while significantly improving consistency (see Table 3). Specifically, the IoU consistency improves by 4.58% (sella) and 6.68% (clival recess), and landmark detection consistency improves by 46.97%, reducing the mean distance error from 11.22 pixels to 5.95 pixels. A fivefold cross-validation in Table 1 further confirms the robustness of these enhancements.Table 4. Ablation study for assessing the impact of the hyperparameters on model performance (single fold)Weights \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[w_1, w_2, w_3, w_4, w_5]$$\end{document} SellaClival recessLandmarksSellaClival recessLandmarksOverall* \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} IoU (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK10 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} IoU (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} MPCK10 (%) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\uparrow $$\end{document} Performance EvaluationConsistency Evaluation[0.9, 0.00, 0.8, 0.2, 0.000]67.7447.1940.4483.6770.0299.79274.54[0.9, 0.01, 0.8, 0.2, 0.001]71.2956.7369.2885.5370.8799.46310.95[0.9, 0.05, 0.8, 0.2, 0.001]70.8555.5778.3785.5772.31100.00320.52[0.9, 0.10, 0.8, 0.2, 0.005]71.1652.3484.6486.7073.4399.90 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{326.36}$$\end{document} [0.9, 0.15, 0.8, 0.2, 0.010]70.9053.3980.5686.5774.6299.79323.10[0.9, 0.20, 0.8, 0.2, 0.020]70.8152.6980.1386.9674.89100.00322.81Note: * The overall score is calculated as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {\left[ \frac{IoU_{S} + IoU_{C}}{2} + MPCK10\right] _{Per.} + \left[ \frac{IoU_{S} + IoU_{C}}{2} + MPCK10\right] _{Con.}}$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {IoU_{S}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {IoU_{C}}$$\end{document} represent the IoU metrics for the Sella and Clival recess, respectively. Here, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {Per.}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {Con.}$$\end{document} are abbreviations for performance and consistency evaluations, respectively. The bolded value represents the overall highest performance and consistency achieved by the model
Interestingly, adding CoTracker2 alone can initially degrade performance due to noisy pseudo-labels, especially when the real labels are occluded by instruments. Although introducing ConvLSTM recovers some performance, it still lags behind the baseline, suggesting that temporal modeling alone is insufficient for fully leveraging pseudo-labels. The consistency loss proves crucial for refining these labels and temporal features. Only with all components—CoTracker2, ConvLSTM, and the consistency loss—does performance exceed the baseline, demonstrating a synergistic effect for surgical video analysis.
Selection of hyperparameters: Following PitSurgRT [4], we incorporate a temporal module and a pseudo-label generation module to capture temporal continuity in this work. Alongside the Dice, Wing, and Focal losses from PitSurgRT [4], we add two L1 losses to foster frame-to-frame prediction consistency. While we retain the same weights for Dice, Wing, and Focal losses, we fine-tune the L1 loss weights. Table 4 illustrates the impact of these hyperparameters on model performance. To balance accuracy and consistency, we consider an overall score (last column in Table 4), which indicates that assigning weights \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{1}=0.9,\, w_{2}=0.1,\, w_{3}=0.8,\, w_{4}=0.2,$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{5}=0.005$$\end{document} is optimal for this task.
Conclusion
This paper introduced ConsisTNet, a novel spatio-temporal model designed to enhance consistency in anatomical localization during eTSA. ConsisTNet integrates HRNet for feature extraction, a temporal module using ConvLSTM, and CoTracker2 for pseudo-label generation and consistency evaluation. Our results demonstrate that ConsisTNet outperforms both image-based and video-based state-of-the-art methods in accuracy and consistency. Notably, it achieved significant improvements in segmentation and landmark detection consistency compared to our previous model, PitSurgRT. The optimized implementation using TensorRT enables real-time performance. This work represents a significant step toward more reliable and stable anatomical localization in eTSA, potentially enhancing surgical precision and safety. Future work will deploy this system and assess clinical impact. In addition, given the relatively small size of our current dataset, we are actively working on collecting and annotating additional surgical videos, which will enable more comprehensive validation of our approach in future studies.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Enkaoua A, Islam M, Ramalhinho J, Dowrick T, Booker J, Khan DZ, Marcus HJ, Clarkson MJ (2023) Image-guidance in endoscopic pituitary surgery: an in-silico study of errors involved in tracker-based techniques. Front Surgery, 1010.3389/fsurg.2023.1222859 PMC 1054062737780914 · doi ↗ · pubmed ↗
- 2Mao Z, Das A, Islam M, Khan DZ, Williams SC, Hanrahan JG, Borg A, Dorward NL, Clarkson MJ, Stoyanov D, et al (2024) Pitsurgrt: real-time localization of critical anatomical structures in endoscopic pituitary surgery. Int J Comput Assist Radiol Surg, pp 1–810.1007/s 11548-024-03094-2PMC 1117857838528306 · doi ↗ · pubmed ↗
- 3Das A, Sidiqi B, Mennillo L, Mao Z, Brudfors M, Xochicale M, Khan DZ, Newall N, Hanrahan JG, Clarkson MJ, et al (2024) Automated surgical skill assessment in endoscopic pituitary surgery using real-time instrument tracking on a high-fidelity bench-top phantom. Healthcare Technol Lett 10.1049/htl 2.12101 PMC 1166578539720762 · doi ↗ · pubmed ↗
- 4Bradshaw TJ, Huemann Z, Hu J, Rahmim A (2023) A guide to cross-validation for artificial intelligence in medical imaging. Radiol Artif Intell 5(4):22023210.1148/ryai.220232 PMC 1038821337529208 · doi ↗ · pubmed ↗
