Stable distance regression via spatial–frequency state space model for robot-assisted endomicroscopy
Mengyi Zhou, Chi Xu, Stamatia Giannarou

TL;DR
This paper introduces a new model for accurately measuring distances in microscopic imaging during robotic surgeries, improving precision and stability.
Contribution
The novel SF-BiS4D model processes images bidirectionally in spatial and frequency domains for improved distance regression in endomicroscopy.
Findings
The SF-BiS4D model outperforms existing methods in accuracy and stability for pCLE distance regression.
A guided trajectory planning strategy generates pseudo-distance labels for training sequential models.
Hierarchical guided fine-tuning reduces model size without sacrificing performance.
Abstract
Probe-based confocal laser endomicroscopy (pCLE) is a noninvasive technique that enables the direct visualization of tissue at a microscopic level in real time. One of the main challenges in using pCLE is maintaining the probe within a working range of micrometer scale. As a result, the need arises for automatically regressing the probe–tissue distance to enable precise robotic tissue scanning. In this paper, we propose the spatial frequency bidirectional structured state space model (SF-BiS4D) for pCLE probe–tissue distance regression. This model advances traditional state space models by processing image sequences bidirectionally and analyzing data in both the frequency and spatial domains. Additionally, we introduce a guided trajectory planning strategy that generates pseudo-distance labels, facilitating the training of sequential models to generate smooth and stable robotic…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —http://dx.doi.org/10.13039/501100000288Royal Society
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMRI in cancer diagnosis · Radiomics and Machine Learning in Medical Imaging · Medical Imaging Techniques and Applications
Introduction
In recent years, pCLE has emerged as a SOTA biophotonics technique that enables real-time visualization of tissue cellular morphology. This noninvasive imaging method has demonstrated great potential for tissue characterization during tumor resection, greatly enhancing the precision and effectiveness of oncological surgeries, such as open neurosurgery [1–3]. However, maintaining the ideal working distance between the tissue and the pCLE probe, which is of micrometer scale, presents a significant ergonomic challenge during manual tissue scanning. Precise scanning can be achieved through microsurgical robotic manipulation and automatic estimation of the probe–tissue distance and orientation has attracted significant interest recently [4–8].
The field of medical imaging has experienced significant advancements with the integration of deep learning technologies, particularly in developing regression networks aimed at automating the focusing process in confocal microscopy. Research in this area has led to a variety of innovative approaches. For example, Zhenbo et al. [9] and Tomi et al. [10] adapted conventional convolutional neural network (CNN) models to accurately predict the optimal focus distance for capturing in-focus images. Diverging from methods reliant on spatial domain information, Jiang et al. [11] introduced a novel preprocessing technique that converts images into a multi-domain representation, subsequently fed to a regression CNN model to enhance focusing accuracy. Similarly, Zhang et al. [12] employed Sobel filters in various orientations to produce gradient images. These images serve as inputs to a diversity-learning network, which determines the precise focus distance.
Maintaining the pCLE probe within its working range has proven to be more challenging than traditional confocal microscopy, because improvements in image clarity become less apparent as we approach the optimal position [8]. This is in contrast to the more linear relationship observed in conventional microscopy methods. Furthermore, the pronounced noise in pCLE data complicates accurate distance estimation with deep learning networks. To address these challenges, Xu et al. developed the SFFC-Net, which leverages both frequency and spatial domain features to enhance distance regression accuracy [5]. Additionally, they developed a generative adversarial network (GAN) and sequence-attention (SA) module [6] to incorporate robust image-based supervision and temporal information, respectively. However, attention-based modules can easily memorize the training data rather than learn generalizable patterns for short image sequences [13]. This may affect the stability of the regression.
In the literature, temporal models, like recurrent neural networks (RNNs), have been proposed for the analysis of sequential images, being able to utilize temporal information effectively [14–16]. The recently proposed state space model (SSM) provides a parametric framework for mapping input information to output predictions, being suitable for processing time-series data across various domains [17]. Notably, Gu et al. developed the structured state space (S4) model [18] and its variant Mamba [19], which have shown superior efficiency in extracting temporal information from sequential data.
In this work, we designed and implemented a regression framework for visual servoing in robot-assisted pCLE tissue scanning. (1) To enforce stability during scanning, we propose a novel distance regression method which fuses spatial–frequency and temporal information. To learn temporal information, different from the sequence-attention layer proposed in [6], we developed the bidirectional structured state space model (BiS4D) which advances the SSM by incorporating bidirectional image sequence processing and analyzing data representations in both the frequency and spatial domains to improve the stability of model regression. (2) To enable the sequential model to learn smooth and stable probe trajectories for robotic scanning, instead of using the ground truth distance labels [5, 6, 14–16], a novel guided trajectory planning strategy is designed to generate pseudo-distance labels for sequential model training. (3) To speed up the inference time, the hierarchical guided fine-tuning approach is introduced to effectively reduce the size of the BiS4D model and maintain the performance. The proposed method has been extensively validated on ex vivo pCLE data and has shown superior performance in terms of stability and accuracy.
Methodology
Fig. 1. The overall framework of SF-BiS4D. (Top) The Training Phase. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bigoplus $$\end{document} represents the concatenation operation. Here, the raw image is concatenated with the image sequence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{seq}$$\end{document} retrieved from the PRD dataset. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {E}$$\end{document} represents the feature encoder between the feature extractor and the BiS4D. (Bottom) The Inference Phase. If fewer than 10 frames are available, we use all the available frames
The proposed framework is composed of two distinct phases, namely, a training phase, where the model learns to predict probe–tissue distances, and an inference phase, where the model guides the probe to converge to the optimal scanning position and capture pCLE data.
As shown in Fig. 1, during the training phase, each input contains the initial probe position \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ P_0 $$\end{document} relative to the tissue surface and the corresponding pCLE image. Initially, the guided trajectory generator \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal {G}_T(\cdot ) $$\end{document} uses \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ P_0 $$\end{document} to produce a series of positions, leading to the optimal scanning position, forming the guided trajectory \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \textbf{P}_{seq} = \mathcal {G}_T(P_0) $$\end{document} . Corresponding pCLE images \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \textbf{I}_{seq} $$\end{document} from these positions are then extracted from the PRD dataset video, normalized, and fed into the SF-BiS4D model. The pretrained DR-GAN \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal {F}_{FE}(\cdot ) $$\end{document} [6] extracts spatial and frequency domain features from each image ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ u_i = \mathcal {F}_{FE}(\textbf{I}^i_{seq}) $$\end{document} ). These features are concatenated into feature sequences \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \textbf{U}_{seq} $$\end{document} and processed by the encoder \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \varvec{\varepsilon }(\cdot ) $$\end{document} for feature fusion. The BiS4D model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal {F}_{T}(\cdot ) $$\end{document} then analyzes the fused features to predict the probe–tissue distance for each sequence image ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ d_{pred} = \mathcal {F}_{T}(\varvec{\varepsilon }(\textbf{U}_{seq})) $$\end{document} ), using a loss function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathcal {L}_{MM}(d_{pred}, d_{GT}) $$\end{document} to minimize the error between predicted and ground truth distances.
In the inference phase, the trained SF-BiS4D model analyzes the initially acquired pCLE image \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ I_0 $$\end{document} to predict the probe–tissue distance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ d^0_{pred} = \mathcal {F}_{T}(\varvec{\varepsilon }(\mathcal {F}_{FE}(I_0))) $$\end{document} . Based on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ d^0_{pred} $$\end{document} , the position of the probe is adjusted with respect to the tissue surface, and a new pCLE image \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ I_1 $$\end{document} is acquired and added to the sequence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \textbf{I}_{seq} $$\end{document} for subsequent distance prediction. This process is repeated until convergence (when the probe is predicted to be stabilized within the working range), as outlined in the inference phase in Fig. 1.
Spatial frequency bidirectional structured state space model (SF-BiS4D)
To extract efficient pCLE data representations, the DR-GAN feature extractor is used to generate spatial and frequency domain features (SF). Frequency features are important as they contain information related to image blurriness and therefore to the distance of the probe from the tissue surface (i.e., blurry images are associated with high probe–tissue distance and contain more low-frequency information) [5].
To regress the distance between the pCLE probe and the tissue surface, we designed the BiS4D model. Our model extends the S4D model [18] to process an image sequence in both forward and backward directions. This enables the model to capture data dependencies not only from past to future but also from future to past. To achieve this, two parallel stacks have been created, namely, the forward S4D layer and the backward S4D layers. Image sequences are processed in both forward and reverse order separately by these two stacks.Fig. 2. The structure of BiS4D. (Left) A single bidirectional layer BiS4D. (Right) The schematic diagram of the multilayer BiS4D network
The structure of a single bidirectional layer BiS4D is depicted in Fig. 2 (Left). For a sequence of image features \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{U}_{seq}$$\end{document} of length L, the forward S4D branch sequentially processes the input. Conversely, the backward S4D branch processes the reverse feature sequence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overleftarrow{\textbf{U}_{seq}}$$\end{document} , thereby capturing temporal dependencies in the opposite direction. Subsequently, the model combines the results of both branches and resizes it to make it of the same size as the input of the regressor.
To improve the accuracy of the regression, multiple forward and backward S4D layers can be integrated as depicted in Fig. 2 (Right) and described in Algorithm 1. The output of forward and backward S4D modules at each layer is integrated sequentially, and subsequently fed to the next layer. In Algorithm 1, the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{U}_{fwd}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{U}_{bwd}$$\end{document} are forward and backward sequential features, respectively.
Algorithm 1Bidirectional S4D Algorithm
Guided trajectory planning strategy
The aim of this component is to generate pairs of pCLE images and pseudo-distance labels which simulate smooth probe trajectories. This will enable our sequential model BiS4D to learn smooth and stable probe trajectories for robotic scanning which will enable the probe to converge to the optimal position within the working range. For this purpose, we propose a guided distance trajectory planning strategy, the so-called exponential distance discretization (EDD).
Exponential Distance Discretization For a given initial position \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_0\in [-400\mu m,400\mu m]$$\end{document} , we generate a sequence of positions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_i$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(i =0,1,...,N)$$\end{document} which represent probe distances with respect to the tissue surface. These distances are distributed exponentially across N steps and calculated as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P_i = P_{\text {0}} \cdot \exp {\left( \frac{i}{N-1} \cdot \ln \left( \frac{P_{\text {0}}}{\alpha }\right) \right) } \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} is a hyperparameter which controls the speed of the convergence to the optimal position, which is set to 0.1 for fast convergence.
However, Eq.(1) only includes calculations for which \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_0 > 0$$\end{document} . Consequently, to accommodate both for positive and negative positions, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_i$$\end{document} is generated as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P_i = {\left\{ \begin{array}{ll} -|P_0| \cdot \exp {\left( \frac{i}{N-1} \cdot \ln \left( \frac{|P_0|}{\alpha }\right) \right) } & \text {if } P_0 < 0, \\ 0 & \text {if } P_0 = 0, \\ P_{0} \cdot \exp {\left( \frac{i}{N-1} \cdot \ln \left( \frac{P_{0}}{\alpha }\right) \right) } & \text {if } P_0 > 0. \end{array}\right. } \end{aligned}$$\end{document}Subsequently, the generated position sequences are discretized by rounding them to the nearest multiple of 5 as in Eq.(3), ensuring compatibility with the ground truth distance labels which have been generated using a distance step equal to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\mu m$$\end{document} .
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} P_i^{discretized}=\left\lfloor \frac{P_i}{5}\right\rfloor \cdot 5 \end{aligned}$$\end{document}where the operator \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lfloor \cdot \rfloor $$\end{document} represents the rounding down function.
Hierarchical guided fine-tuning
To mitigate the computational complexity of the BiS4D model, we designed a fine-tuning approach. This involves reducing the number of bidirectional layers to 1. To preserve performance despite this layer reduction, we also incorporated the pretrained DR-GAN [6] as a frozen feature extractor, while fine-tuning the encoder \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{\varepsilon }$$\end{document} within the BiS4D model. Although this modification, namely, F-SF-BiS4D, maintained performance of stability, it significantly slowed the convergence rate, as evidenced by the MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st}$$\end{document} in Table 3.
In response to this challenge, we designed the guided controller, which controls the activation of the DR-GAN regressor and the deactivation of the F-SF-BiS4D model during the inference phase. Initially, in the absence of substantial temporal data, such as at the early stages of tissue scanning, the guided controller (G) utilizes the DR-GAN regressor to predict the probe–tissue distance for the first three scanning steps. The use of the DR-GAN regressor allows the model to converge to the valid working range rapidly. Once sufficient temporal information becomes available after the initial 3 steps, the guided controller is deactivated, allowing the F-SF-BiS4D to operate. This hierarchical inference design ensures that the guided fine-tuned SF-BiS4D model (GF-SF-BiS4D) integrates both guided initialization and fine-tuning, optimizing both speed of convergence and stability.
Loss function
To train our proposed regressor, the model has been optimized utilizing the mean absolute error (MAE) and mean absolute percentage error (MAPE), in our loss function as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{MP}=$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{2}\cdot $$\end{document} (MAE+MAPE).
MAE The MAE is a simple and robust loss function calculated as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{MAE} = \frac{1}{n}\sum _{i = 1}^{n}|y_i - \hat{y_i}|\end{aligned}$$\end{document}where n is the number of data, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_i$$\end{document} is the true value, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{y_i}$$\end{document} is the predicted value.
MAPE In situations where the distance between the pCLE probe and the tissue surface is within the optimal imaging range, it could be desirable for the model to remain within this range and to predict the distance with the greatest possible accuracy. Thus, we use the MAPE defined as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{MAPE} = \frac{1}{n}\sum ^n_{i=1}|\frac{y_i-\hat{y_i}}{y_i} |\times 100 \end{aligned}$$\end{document}Experiments
PRD Dataset1 [5]. ex vivo pig brain tissue, treated with 0.1% acriflavin, was scanned using a Z 1800 confocal miniprobe (Cellvizio, Mauna Kea Technologies, Paris). The miniprobe was operated using the Kinesis^®^ K-Cube™ Stepper Motor Controller (Thorlabs, USA), covering a distance range of 400 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m to -400 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m from the tissue surface in steps of 5 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m. A total of 62 pCLE videos and their corresponding probe positions were recorded from independent samples. The optimal scanning position was identified by locating the pCLE frame with the least blur, confirmed through expert review by a neurosurgeon. According to the Z miniprobe’s specifications, its working range is between 35 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m and 400 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m from the tissue surface. For the experiment, 50 videos (7,539 frames) were used for training, while 12 videos (1,706 frames) were reserved for testing.
Implementation details All experiments were conducted on an NVIDIA RTX A5000 GPU with the memory of 24GB, based on the PyTorch framework. The batch size during training was set to 8, and the same random seed was employed across all experiments. The Adam optimizer [20] was employed with a cyclic learning rate schedule [21], where the learning rate oscillates between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 10^{-5}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \times 10^{-4}$$\end{document} every 5 epochs, helping the model achieve better gradient descent performance. All the images used in the training and inference phase have been normalized following the same local and global normalization method previously proposed in [5]. In the experiments, all models are trained on the training datasets. All trained models are frozen and validated on the testing datasets.
Convergence and stability study The convergence and stability study is aimed at validating whether the regression model can guide the robotic system to converge to the optimal scanning position and stabilize there during the robotic scanning. We followed the evaluation methods proposed in [5], using a K-step incremental analysis, with image feedback steps ranging from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1$$\end{document} to K. Each test pCLE image has a corresponding ground truth probe–tissue distance, which serves as the initial position \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_0$$\end{document} for the K-step experiment. At each step k, the pCLE image is concatenated with the previous image sequence and fed to the model to predict the probe–tissue distance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d^k_{pred}$$\end{document} . The new probe position is then calculated based on the predicted distance. This process is repeated for 20 iterations and the resulting trajectories of the probe are analyzed using the evaluation metrics of convergence and stability. The quality of convergence is assessed by the metrics MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C$$\end{document} and BM, which are the mean absolute error and the blurriness scores of pCLE images estimated by blur metrics proposed by [6] after convergence, respectively. The stability of regression is evaluated by the width of the upper-lower bound after convergence (blue and red dotted lines in Fig. 3).Fig. 3K-step incremental analysis of SF-BiS4D and GA-SA-RBF on the testing set. The figure illustrates the positions of the pCLE probe predicted by the regression models. Two different trajectories are presented for initial images taken from the test set, located at 385 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m and -320 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m which are presented as lines with red and blue triangles, respectively. Within the first four steps, the probe converges within the working range. As the probe can acquire high-quality images at this position, the model for the next steps stabilizes the probe at this position and predicts small subsequent movements
Comparison study To ensure consistency with established benchmarks proposed in [6], we evaluated the performance of our model using the same evaluation criteria defined in [6] , namely, MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st}$$\end{document} , MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C$$\end{document} , ACC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{dir}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B$$\end{document} , and BM. Our model was compared against SOTA single-image distance regression models, SFFC-Net [5] and DR-GAN [6], as well as the temporal information-enhanced GA-SA-RBF model [6]. As depicted in Table 1, the SF-BiS4D with three bidirectional layers outperforms both SFFC-Net and DR-GAN in terms of MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st}$$\end{document} and ACC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{dir}$$\end{document} , demonstrating faster convergence (also verified in Fig. 3). Moreover, when compared to the GA-SA-RBF model, both our SF-BiS4D and the GF-SF-BiS4D models show significantly improved performance in MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C_{20}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{20}$$\end{document} . As shown in Table 2, a paired t test comparing SFFC-Net, DR-GAN, and GA-SA-RBF with our models (SF-BiS4D, GF-SF-BiS4D) showed p values significantly below 0.05 for all metrics with SFFC-Net and DR-GAN, confirming statistical significance. For GA-SA-RBF, p values were below 0.05 for MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C_{20}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{20}$$\end{document} , indicating our models’ superiority, while \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} showed comparable performance with near-zero t-statistic value. The stability of the trajectories produced by SF-BiS4D, as shown in Fig. 3, is notably higher than those from the GA-SA-RBF model. This indicates that our BiS4D sequential model is more effective at learning temporal information than the attention-based GA-SA-RBF model.Table 1. Comparison of regression models. The unit of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B $$\end{document} , MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st}$$\end{document} , and MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C$$\end{document} is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu m$$\end{document} . The Params are the overall parameters of the model and t is the inference time per frame (ms). The best results are highlighted in bold, and the second-best results are underlinedNetworkMAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st} \downarrow $$\end{document} ACC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{dir} \uparrow $$\end{document} MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C_{20} \downarrow $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20} \uparrow $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{20} \downarrow $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Params \downarrow $$\end{document} tSFFC-Net64.2994.49%41.19 ± 0.910.905113.9414 M****13.6DR-GAN63.6894.55%32.99 ± 0.970.92070.6814 M****13.6GA-SA-RBF64.3094.90%31.11 ± 0.310.94167.6518.3M 18.3SF-BiS4D63.57****95.43% \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underline{28.73\pm 0.17}$$\end{document} 0.94155.5917.1M19.7GF-SF-BiS4D63.6894.55% \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf {25.32}}\pm {\textbf {0.07}}$$\end{document} 0.94050.9616.4M15.7Table 2Paired t test of SOTA methods vs SF-BiS4D and GF-SF-BiS4D. "t-statistic" is the t-statistic value, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p<0.05$$\end{document} shows whether the condition "p value < 0.05" is satisfied to prove the statistical significance. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} means statistical significance and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} means no statistical significanceMetricsSFFC-NetDR-GANGA-SA-RBFt-statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p<0.05$$\end{document} t-statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p<0.05$$\end{document} t-statistic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p<0.05$$\end{document} SOTA models vs SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{MAE}^{20}_{C}$$\end{document} 555.93 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 178.67 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 278.04 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} 21.06 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 38.79 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{B}_{20}$$\end{document} 2603.39 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 632.90 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 1408.91 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} SOTA models vs GF-SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{MAE}^{20}_{C}$$\end{document} 718.20 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 325.75 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 752.50 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} 21.87 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 40.64 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 1.85 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{B}_{20}$$\end{document} 2850.16 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 837.52 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 2169.13 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document}
Ablation Study Ablation study was conducted to evaluate the contribution of different strategies to our regression model’s performance, using the DR-GAN as the baseline.Table 3. Ablation study of the proposed strategies. GTP, FT, G represent guided trajectory planning, fine-tuning, and guided controller. The symbol \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\mathrm{w/o}$$\end{document} represents the model trained without GTP. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} represents the inclusion of a strategy. The best results are highlighted in bold, and the second-best results are underlinedNetworkBiS4DGTPFTGMAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st}$$\end{document} ACC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{dir} $$\end{document} MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C_{20} $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{20}$$\end{document} DR-GAN63.6894.55%32.99 ± 0.970.92070.68S-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 78.6795.25%30.40 ± 0.660.88973.26SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\mathrm{w/o}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 66.5595.13%30.26 ± 0.480.90170.68SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 63.57****95.43%28.73 ± 0.170.94155.59F-SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 72.7695.25%25.46 ± 0.070.93245.96G-SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 63.6894.55%28.82 ± 0.230.94256.37GF-SF-BiS4D \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\checkmark $$\end{document} 63.6894.55% \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf {25.32}}\pm {\textbf {0.07}}$$\end{document} 0.94050.96Table 4Ablation study of hyperparameters and loss functions. L is the sequence length for BiS4D sequential model. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\textbf {N}}}_{S}$$\end{document} is the number of steps when guided controller activated in GF-SF-BiS4D. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}$$\end{document} is loss function in training SF-BiS4D, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{M}$$\end{document} is MAE, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{P}$$\end{document} is MAPE, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{MP}$$\end{document} is MAE+MAPE, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{L}$$\end{document} is the likelihood loss function. The best results are highlighted in boldMAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{1st}$$\end{document} ACC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{dir}$$\end{document} MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C_{10}$$\end{document} MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^C_{20}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{10}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{10}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{20}$$\end{document} L 572.5196.13%37.93 ± 0.2837.07 ± 0.170.8950.89688.3885.24 1063.5795.43%29.66 ± 0.48 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf {28.73}}\pm {\textbf {0.17}}$$\end{document} 0.939****0.941****60.44****55.59 1566.4794.96%31.04 ± 0.5230.52 ± 0.310.9160.92674.5769.92 20117.0494.49%36.42 ± 6.4134.45 ± 0.220.9060.917144.9090.28 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf {N}}_{S}$$\end{document} 072.7695.25%27.23 ± 0.9425.46 ± 0.070.9250.93263.4345.96 163.6894.55%27.82 ± 0.7926.50 ± 0.110.9300.93557.8656.42 263.6894.55%27.13 ± 0.5025.93 ± 0.060.9350.93957.3255.51 363.6894.55%26.99 ± 0.8225.32 ± 0.070.937****0.940****54.7850.96 463.6894.55%27.29 ± 1.2525.77 ± 0.080.9270.93464.1761.33 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{M}$$\end{document} 81.10**96.24%**33.86 ± 1.3030.42 ± 0.470.9090.93690.8066.80 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{P}$$\end{document} 67.1094.95%30.00 ± 0.6128.16 ± 0.180.9300.94072.0256.39 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{MM}$$\end{document} 63.5795.43% \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf {29.66}}\pm {\textbf {0.48}}$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf {28.73}}\pm {\textbf {0.17}}$$\end{document} 0.939****0.941****60.44****55.59 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{L}$$\end{document} 63.8394.96%29.88 ± 0.9931.93 ± 1.260.9170.92473.2672.86
As demonstrated in Table 3, integrating our BiS4D sequential model and guided trajectory planning (GTP) strategy into the pretrained DR-GAN feature extractor significantly enhances both convergence and stability (rows 4-6). To evaluate the impact of feature domains on distance regression, we compared models which use only spatial domain features (S-BiS4D) against those which incorporate both spatial and frequency domain features (SF-BiS4D) in rows 2 and 4. The results show that SF-BiS4D surpasses S-BiS4D across all performance metrics. Additionally, to understand the specific contribution of GTP, we compared the performance of the SF-BiS4D model with and without the GTP strategy (rows 3 and 4). The results indicate a significantly higher \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BM_{20}$$\end{document} and lower \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^B_{20}$$\end{document} for SF-BiS4D with GTP, suggesting that this strategy not only stabilizes trajectories but also enhances the quality of pCLE images after convergence. Furthermore, rows 5–7 illustrate that the guided controller (G) accelerates convergence, while fine-tuning (FT) boosts stability.
We also conducted ablation experiments on the hyperparameters which include the sequence length L for sequential models and the number of steps N \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{S}$$\end{document} after which the guided controller G is deactivated in GF-SF-BiS4D. Additionally, we compared various loss functions: the MAE loss ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_M$$\end{document} ), MAE+MAPE loss ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{MM}$$\end{document} ), and likelihood-based mean squared error ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L_{L}}$$\end{document} ) [22], as detailed in the final section of Table 4. Results in Table 4 indicate that the BiS4D sequential model exhibits optimal regression accuracy and stability with an input sequence length L of 10. For guided controller G, the GF-SF-BiS4D model has the highest stability of regression when the G is activated in the first three steps. Furthermore, the SF-BiS4D model trained by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {L}_{MM}$$\end{document} demonstrates superior performance in both accuracy and stability, as shown in the last section of Table 4.
Conclusion
This paper introduces the spatial frequency bidirectional structured state space model (SF-BiS4D), a novel deep learning framework for guiding robot-assisted endomicroscopy with pCLE. Our performance evaluation study has shown that the proposed regression model can be deployed in real time and with accuracy which is within the valid working range. So far, the model has been trained on ex vivo animal tissue. Our next step is to expand our training dataset with pCLE data captured from human tissue to enhance the model’s robustness to different tissue morphologies, thereby improving its generalizability. Furthermore, in our future work, the proposed probe–tissue distance regression model will be integrated with our developed orientation regression model [7]. Together, they will be deployed in our high-accuracy 6-DoF robotic system to achieve fully automated robotic tissue scanning.
Supplementary Information
Below is the link to the electronic supplementary material.Supplementary file 1 (pdf 490 KB)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Xu C, Roddan A, Xu H, Stamatia G (2024) Ff-vit: probe orientation regression for robot-assisted endomicroscopy tissue scanning. IJCARS, 1–9 10.1007/s 11548-024-03113-210.1007/s 11548-024-03113-2PMC 1158227638598141 · doi ↗ · pubmed ↗
- 2Xu X, Zhao S, Gong L, Zuo S (2024) A novel contact optimization algorithm for endomicroscopic surface scanning. IJCARS, 1–11 10.1007/s 11548-024-03223-x 10.1007/s 11548-024-03223-x 38970745 · doi ↗ · pubmed ↗
- 3Mosbach M, Andriushchenko M, Klakow D (2020) On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines. ar Xiv preprint ar Xiv:2006.04884
- 4Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. ar Xiv preprint ar Xiv:1412.3555
- 5Gu A, Dao T (2023) Mamba: Linear-time sequence modeling with selective state spaces. ar Xiv preprint ar Xiv:2312.00752
- 6Kingma DP (2014) Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980
- 7Hamilton M, Shelhamer E, Freeman WT (2020) It is likely that your loss should be a likelihood. ar Xiv preprint ar Xiv:2007.06059
