Causal dynamic decision-making for robotic systems in non-Markovian high-difficulty surgery

Guo Na; Tan Minghui; Li Tiantian; Liu Yang; Zhang Qinjian; Li Yuanxin; Xu Tianlei; Sun Fuchun

PMC · DOI:10.3389/fneur.2026.1767832·February 20, 2026

Causal dynamic decision-making for robotic systems in non-Markovian high-difficulty surgery

Guo Na, Tan Minghui, Li Tiantian, Liu Yang, Zhang Qinjian, Li Yuanxin, Xu Tianlei, Sun Fuchun

PDF

Open Access

TL;DR

This paper introduces a new causal modeling framework for surgical robots to handle unpredictable intraoperative events like sudden bleeding or instrument loss.

Contribution

A novel causal dynamic decision-making framework using VAR and Granger causality for non-Markovian surgical scenarios.

Findings

01

The framework achieves 95.60% accuracy in causal inference with high stability across 10,000 samples.

02

Recall slightly exceeds precision, aligning with clinical safety priorities.

03

The method captures non-Markovian temporal correlations and is not limited to specific procedures.

Abstract

Markov assumption-based surgical decision models cannot account for the time-varying, irregular effects of high-risk intraoperative anomalies such as sudden hemorrhage or inadvertent instrument loss, making them inadequate for specialized procedures like neurosurgery and spinal interventions. To overcome the non-Markovian limitations of conventional surgical process modeling, this study develops a causal modeling framework based on Vector Autoregression (VAR) and Granger causality analysis. The framework constructs a causal chain (original gesture Si → abnormal event Ej → recovery action Zk ) to enable intelligent response and adaptive decision-making. Validation was performed on a large-scale synthetic dataset containing 10,000 samples (including anomaly, positive, and negative cases), and evaluated using accuracy, F1-score, and recall metrics. Experimental results show the…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

VAR

Diseases1

hemorrhage

Figures3

Click any figure to enlarge with its caption.

Clinical value and schematic representation of the causal dynamic decision-making framework for robotic surgery in non-Markovian high-difficulty scenarios. (a) Surgical application value of the causal dynamic decision-making framework; (b) Surgical suturing process under normal conditions; (c) Surgical suturing procedures in abnormal situations (such as needle drops); (d) Syntax diagram for surgical tasks (26); (e) Execution errors under abnormal circumstances (27).

Implementation pipeline of the causal dynamic decision-making framework for surgical robots. Firstly, the key features of the surgical video are extracted to construct a characterization framework, and then the VAR model is used for temporal correlation modeling, and finally the causal timing is verified by Granger causality test, so as to accurately identify the causal relationship and provide theoretical support for the autonomous decision-making of surgical robots.

Confusion matrix analysis for different sample sizes. (a) The results of the model at a scale of 5,000 samples (2,500 positives and 2,500 negatives); (b) the results of the model at a scale of 10,000 samples (5,000 positives and 5,000 negatives).

Tables1

Table 1. Analysis of experimental results.

Positive example\Negative example	Accuracy rate	Precision rate	Recall rate	F1 score	MCC
2,500\2,500	95.60%	95.34%	95.88%	95.60%	0.912
5,000\5,000	95.66%	94.96%	96.44%	95.77%	0.912

Keywords

causal inferencedynamic decision-makingGranger causalitynon-Markov processessurgical robotics

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education

Full text

Introduction

1

Decision-making is the core prerequisite for surgical robots to achieve autonomous operation. Recent advances in embodied intelligence (1) have propelled the field beyond traditional leader-follower control toward systems capable of environmental perception and autonomous decision-making. The early STAR robot, developed by the Johns Hopkins University team, utilized near-infrared fluorescence imaging for millimeter-level tissue perfusion identification (2, 3); while the more recent SRT-H system employs miniature cameras and precision control to achieve submillimeter accuracy in vascular clamping (4, 5). And frameworks like VPPV have demonstrated “zero-shot transfer” of operational strategies for multi-task decision-making (1).

However, existing surgical decision-making methods predominantly focus on conventional laparoscopic procedures such as cholecystectomy, with a primary emphasis on automating localized manipulations such as suturing and knot-tying. Systematic research into task-level autonomous decision-making for complex, high-risk surgical specialties including neurosurgery and spinal surgery remains notably scarce. Task-level surgical video data are often confounded by numerous variables such as instrument slippage (as visually illustrated for normal vs. abnormal scenarios Figures 1b,c) (6) and tissue deformation (7), which exhibit distinct non-Markovian temporal characteristics. This renders conventional temporal models inadequate for capturing long-range causal dependencies, thereby constraining the adaptability and reliability of current systems in complex surgical settings: Deterministic methods such as behavior trees (8, 9), behavior networks (10), or task priority graphs (11) are too rigid to adapt to unexpected events, while probabilistic models based on Markov Decision Processes (MDPs) (12, 13) fail to capture long-range causal relationships. Although the “Transformer-based planning + imitation/reinforcement learning-based control” paradigm (4, 14) is mainstream, traditional reinforcement learning often optimizes statistical correlations (15), leading to performance drops during Sim2Real transfer or sensor failure.

Clinical value and schematic representation of the causal dynamic decision-making framework for robotic surgery in non-Markovian high-difficulty scenarios. (a) Surgical application value of the causal dynamic decision-making framework; (b) Surgical suturing process under normal conditions; (c) Surgical suturing procedures in abnormal situations (such as needle drops); (d) Syntax diagram for surgical tasks (26); (e) Execution errors under abnormal circumstances (27).

Causal reasoning, which transforms correlational strategies into causal ones (16, 17), provides new approaches to address partial observability (POMDP) and confounding factors in robotic perception. Work on Local Causal Models (LCMs) (18) and structural causal models (19, 20) has shown promise in improving sample efficiency and enabling zero-shot transfer in generic robotic tasks. However, these general causal inference methods are ill-suited for surgical scenarios: LCMs rely on locally factored dynamics and stationary environment assumptions, making them incapable of addressing abrupt, non-stationary abnormal events intraoperatively; structural causal models prioritize static causal graph construction, which hinders their ability to capture dynamic, long-term temporal dependencies arising from intraoperative disruptions. Consequently, they cannot be readily applied to surgical scenarios characterized by non-stationary state spaces and unforeseen anomalies. Surgical applications require models that enable safe, priority-driven dynamic causal inference under clinical logic constraints. Therefore, a solution specifically designed for the unique needs of surgical scenarios is urgently needed.

Inspired by work on local confounding detection (21) and adaptive methods (22) in non-stationary environments, this study models surgical robot decision-making as a sequential reasoning problem constrained by clinical logic. We propose a causal dynamic reasoning framework tailored to non-Markovian surgical environments, employing Granger causality inference (23, 24) as the core mechanism. By integrating it with the clinical logic of “surgical gesture – abnormal event – recovery action” and optimizing the lag order of the VAR model (25), the framework achieves safe, priority-based detection of dynamic abnormal events and supports autonomous decision-making.

This approach yields a framework with inherent generalization capabilities. By integrating the core Granger causality mechanism with transferable clinical logic (as schematically depicted in Figure 1a), this framework is not only directly applicable to diverse surgical scenarios such as neurosurgery and spinal procedures but also effectively addresses the non-Markovian dynamics and unexpected anomalies common in clinical practice.

Method

2

Framework overview and causal analysis

2.1

Dynamic causal characteristic analysis of surgical adverse events

2.1.1

Surgical abnormal events (such as instrument drops or tissue damage) interrupt predefined operational workflows (see the surgical task syntax diagram in Figure 1d), and their dynamic adjustment strategies rely on accurate understanding of the “preceding actions - anomaly type - response operations” causal chain. Such anomalous events exhibit pronounced temporal long-range dependencies in non-Markovian environments and may be categorized into two primary types: (1) Procedural errors, manifested as omitted steps or sequential errors; (2) Execution errors, including positioning deviations and instrument loss of control, refer to failures in single-step operations. In minimally invasive surgical environments, such execution errors are particularly common due to constrained visual fields and anatomical variations, directly impacting operational effectiveness and procedural continuity.

Taking the suturing task as an example (as shown in Figure 1e), the handling logic for execution errors exhibits significant causal correlation characteristics. For simplification, it is assumed that each type of error occurs only once and can be successfully addressed:

Needle tip positioning abnormality (G2): When positioning inaccuracy occurs, the system must repeat the positioning operation (G2) until successful before proceeding with tissue puncture (G3);
Equipment Fall Incident (E1): Should instrument drop be detected during the left-hand suture pulling (G6) process, the retrieval operation shall take precedence, thereafter, return to G6 to continue the subsequent procedure;
Equipment Out of Bounds Incident (E3): When the right hand is tightening the suture (G9), the instrument moves beyond the field of view, it is necessary to first adjust the instrument’s position, then decide whether to continue G9 or switch to other operations based on the actual situation.

Notably, these recovery pathways are dynamically determined by the interaction between anomaly type and the current operational context, emphasizing the requirement for autonomous systems to conduct real-time, context-aware causal reasoning in unstructured surgical environment.

Dynamic inference framework for surgical procedures based on Granger causality testing

2.1.2

To establish a dynamic decision-making mechanism suitable for the non-Markovian surgical environments described above, this paper proposes a three-stage causal dynamic reasoning framework based on Granger causality testing. The overall implementation pathway of this framework is illustrated in Figure 2, which outlines the process from surgical video structuring to Granger causality verification. This framework is designed to identify causal relationships between surgical phases and actions, providing a basis for the dynamic decision-making of surgical robots. It comprises the following core components:

1. Structured representation of surgical video streams

Implementation pipeline of the causal dynamic decision-making framework for surgical robots. Firstly, the key features of the surgical video are extracted to construct a characterization framework, and then the VAR model is used for temporal correlation modeling, and finally the causal timing is verified by Granger causality test, so as to accurately identify the causal relationship and provide theoretical support for the autonomous decision-making of surgical robots.

Using deep learning models, key surgical phases and action markers are extracted from surgical videos to construct a structured representation framework. This approach addresses the challenge of modeling phase-action associations in complex scenarios, providing a robust data foundation for subsequent causal analysis.

2. Construction of vector autoregressive (VAR) models to capture temporal dynamics

A Vector Autoregression (VAR) model is applied to capture temporal correlations. Compared to traditional Markov models, the VAR model more effectively captures long-range dynamic dependencies inherent in non-Markovian surgical processes, enabling more comprehensive temporal analysis.

3. Dynamic decision based on granger causal testing

Granger causality testing is employed to validate genuine causal relationships by evaluating predictive capability differences. This method distinguishes between mere temporal correlations and true causal associations, providing a reliable autonomous decision-making strategy for surgical robots.

Structured characterization of surgical procedures

2.2

This study employs a “phase-action” coding method to convert continuous surgical videos into structured temporal sequences: the phase dimension is defined by the continuous execution of standard operational states together with sudden abnormal events; while the action dimension represents the executable operations of the robotic system at the current moment. This integrated approach aligns with surgical cognitive patterns, clearly distinguishing routine operations from abnormal events.

Let there be $[eqn]$ standard operational states $[eqn]$ , M random events $[eqn]$ , W executable operations $[eqn]$ , which are defined as follows:

At any time t:

$[eqn]$ denotes whether the i-th standard operational state is active. If $[eqn]$ =‘Needle Holding’, $[eqn]$ indicates that the step ‘Needle Holding’ is being performed, and $[eqn]$ indicates it is not.

$[eqn]$ denotes whether the j-th random event occurs. If $[eqn]$ = ‘Needle Dropping’, $[eqn]$ indicates that the abnormal event ‘Needle Dropping’ has occurred, and $[eqn]$ indicates it has not.

$[eqn]$ denotes whether the k-th executable action is triggered or recommended by the system. If $[eqn]$ = ‘Needle Retrieval’, $[eqn]$ indicates that the action is triggered, and $[eqn]$ indicates it is not.

Accordingly, the surgical phase and action at time t can be described as:

[eqn]

[eqn]

Where, $[eqn]$ represents the stage variable at time $[eqn]$ , integrating both standard operational states and random events. $[eqn]$ denotes the action variable at time $[eqn]$ , comprising $[eqn]$ executable operational states $[eqn]$ .

Construction of vector autoregression (VAR) models

2.3

The core objective of this study is to learn a dynamic decision function from surgical video time-series data that can accurately describe the mapping from historical phase information to the current execution action. This function represents the conditional probability distribution of the system taking a specific action given the historical surgical context, which can be generally expressed as:

[eqn]

Where $[eqn]$ denotes the length of the historical time window influencing the current decision. This conditional probability distribution essentially defines a state transition process, mapping the sequence of past $[eqn]$ phase states $[eqn]$ to the action variable $[eqn]$ at time t.

To characterize dynamic interactions within the ‘phase’ variable $[eqn]$ and its potential influence on the ‘action’ variable $[eqn]$ , this study employs a Vector Autoregressive (VAR) model as the core temporal modeling tool. The VAR model enables simultaneous estimation of the interdependencies among multiple endogenous variables and captures the persistent effects of historical states on the current system through the introduction of lag terms. This provides an ideal structural foundation for subsequently identifying the Granger causal relationships between the “phase” and “action” variables.

The VAR-based phase-action model can be expressed as follows:

[eqn]

Where, $[eqn]$ is a linear function, $[eqn]$ is the parameter matrix to be estimated, and $[eqn]$ is a random disturbance term. This model represents action decisions as a linear combination of historical phase variables, providing interpretable parameter estimates for subsequent analysis.

More specifically, each action variable (in Section 2.2) can be written as:

[eqn]

Where, $[eqn]$ denotes the k-th action at time $[eqn]$ , $[eqn]$ is the lag order, representing the length of historical information influencing the current action, $[eqn]$ represents the influence of the i-th state at lag $[eqn]$ on action $[eqn]$ , $[eqn]$ $[eqn]$ is the value of the i-th phase variable at past time $[eqn]$ , $[eqn]$ represents the influence of the j-th random event at lag $[eqn]$ on action $[eqn]$ , $[eqn]$ $[eqn]$ is the value of the j-th random event at past time $[eqn]$ , $[eqn]$ is a random noise term at time $[eqn]$ .

This formulation captures both the influence of historical phase states and abnormal events on current actions, providing a clear, interpretable framework for modeling surgical decision-making. These actions may be triggered by abnormal events (e.g., “Bleeding” triggering “Electrocoagulation”) or arise naturally from routine surgical logic (e.g., the “Suturing” state leading to “Knot Tying”). Granger causality, which will be introduced in Section 2.4, is later incorporated into the time-series modeling framework to identify and remove spurious correlations, enabling the construction of a decision model with explicit causal interpretability and strong generalization.

Dynamic decision based on Granger causal testing

2.4

Building upon the VAR-based phase-action model introduced in Section 2.3, Granger causality tests are employed to identify temporal causal relationships between surgical phases and robotic actions. Herein, the “phase component” refers to $[eqn]$ , which denotes the standard operational state variable $[eqn]$ or individual random event variable $[eqn]$ extracted from the integrated stage variable $[eqn]$ (defined in Section 2.2). In a non-Markovian surgical environment, if incorporating historical information from past phase variables $[eqn]$ and random events $[eqn]$ significantly improves the prediction of the current action $[eqn]$ , these variables are considered Granger causes of $[eqn]$ . The testing process, detailed in Algorithm 1, is performed independently for each potential (phase component, action component) causal pair. Algorithm 1Element-wise Granger causality testing in temporal sequences.Flowchart outlines a statistical testing procedure using surgical video data, detailing inputs, output, and stepwise methods for encoding observational data, building restricted and unrestricted models, calculating sums of squares, computing an F statistic, comparing to a critical value, and determining the presence or absence of causal links.

(1) Restricted model: Uses only the historical data of action $[eqn]$ itself for prediction, disregarding the influence of historical phase information, which can be expressed as follows:

[eqn]

Here, $[eqn]$ denotes the constant term, $[eqn]$ represents the lag order, $[eqn]$ signifies the autoregressive coefficient of $[eqn]$ at lag q.

(2) Unrestricted model: Incorporates lagged terms of all historical phase variables to explicitly capture the respective causal influences of past states and events on the current action.

[eqn]

The coefficient $[eqn]$ and $[eqn]$ represent the average effect of the state $[eqn]$ and $[eqn]$ has on the action $[eqn]$ at lag q, e.g., the driving effect of a G6 operation on the picking up needle action, or the triggering effect of dropping needle on the picking up needle action.

(3) Causal significance $[eqn]$

The likelihood ratio statistic is used to assess whether the predictive improvement of the unrestricted model relative to the restricted model is statistically significant, which can be expressed as follows:

[eqn]

Here, $[eqn]$ and $[eqn]$ represent the sums of squared residuals of the restricted and unrestricted models, respectively; $[eqn]$ reflects the prediction error without causal variables, while $[eqn]$ reflects the prediction error after introducing causal variables. $[eqn]$ denotes the sample size, $[eqn]$ represents the total number of lagged terms of the causal variables, $[eqn]$ is the lag order (Q = 2p).

Based on the results of the Granger causality test, a dynamic correction mechanism for surgical procedures is established. When the system detects an abnormal event, it automatically triggers the corresponding corrective action. For example, “pick up needle + re-execute G6” can be determined by integrating its causal association “dropped needle” with the preceding state “G6 operation.” Unlike conventional predefined workflows, this mechanism generates adaptive response strategies through data-driven causal inference, enabling process reconstruction in non-sequential or unexpected surgical scenarios. The corresponding pseudocode is presented as follows:

Algorithm 2Dynamic correction mechanism. Flowchart diagram describes a process for generating recovery action commands based on surgical video streams, VAR models, and historical data, with steps for state retrieval, abnormal event identification, F-statistic computation, Granger-causality check, and differentiated outputs for event- or state-triggered corrections.

Results

3

This section employs synthetic data to simulate clinically abnormal scenarios, thereby quantitatively evaluating the causal reasoning framework’s capability to dynamically model the sequential logic of surgical procedures.

Dataset

3.1

Due to the scarcity of annotated data on abnormal events during actual surgical procedures, coupled with ethical and privacy constraints, this study employs synthetic data to validate a surgical process modeling approach based on Granger causality testing. The dataset design focuses on the causal relationship between “abnormal events” and “countermeasures,” comprising three core temporal variables:

$[eqn]$ (original gesture, e.g., G6 “left hand pulling thread”).

$[eqn]$ (abnormal event, e.g., “dropped needle”).

$[eqn]$ (recovery action, e.g., “pick up needle + re-execute G6”).

A variable value of 1 indicates that an event has occurred, while 0 indicates that it has not.

This study employs both positive and negative examples as samples. Positive examples are constructed based on clinical logic, establishing explicit causal relationships such as “needle drop → needle retrieval” and “inaccurate positioning → repositioning.” This ensures that $[eqn]$ is driven by historical data from $[eqn]$ and $[eqn]$ , thereby simulating the anomaly handling procedures encountered in actual surgical procedures. Negative examples are generated by randomly producing sequences of unrelated variables ( $[eqn]$ , $[eqn]$ , $[eqn]$ )to eliminate temporal correlation interference, thereby validating the model’s discriminative capability in scenarios lacking genuine causal relationships.

To enhance the clinical plausibility of synthetic data, three senior consultants reviewed all predefined causal relationships, yielding a Kappa consistency score of 0.92. Furthermore, to better simulate real surgical environments, we introduced clinical noise in 20% of samples (specifically simulating gesture recognition bias caused by tissue occlusion) to bolster both data authenticity and model robustness. In this study, two datasets of different scales were constructed: one containing 5,000 samples (2,500 positive and 2,500 negative), and the other containing 10,000 samples (5,000 positive and 5,000 negative). The goal is to evaluate the model’s stability and generalization ability under varying data scales.

Experimental setup

3.2

When constructing the VAR model, the selection of the lag order p is crucial to model performance. To effectively capture the influence of prior actions on the current recovery operation while avoiding excessive noise, p must be properly determined. Considering that intraoperative abnormal events typically span about two action steps from occurrence to response (for example, after a “needle drop,” the surgeon must first “pick up the needle” and then “rethread”), this study preliminarily sets the lag order to p = 2.

To further optimize the selection of the lag order and improve both reproducibility and transparency, the Akaike Information Criterion (AIC) was adopted for validation. As a widely recognized information-theoretic metric for model selection, the AIC effectively balances goodness of fit with model parsimony. Specifically, lower AIC values indicate a more optimal trade-off between capturing meaningful temporal dependencies and minimizing the risk of overfitting, thereby ensuring the model remains both robust and computationally efficient. By comparing AIC values under different p settings, the results show that when p = 2, the model achieves the smallest residual (AIC = −3.2), outperforming p = 1 (AIC = −2.8) and p = 3 (AIC = −2.9). Therefore, the optimal lag order was finally determined to be p = 2, ensuring an accurate representation of the surgical dynamics while avoiding overfitting.

The experimental parameters are set as follows: the sample size n = 5,000, the lag order p = 2, the generation probabilities for the variables $[eqn]$ (normal gesture) and $[eqn]$ (abnormal event) are 0.3 and 0.2, respectively, and the weighting coefficient for the lagged term $[eqn]$ and that for the lagged term $[eqn]$ are both 0.5. The regression coefficients (including $[eqn]$ , $[eqn]$ , $[eqn]$ , and $[eqn]$ ) are estimated using Ordinary Least Squares (OLS), with a total of Q = 4 lag terms.

Experimental results

3.3

Table 1 and Figure 3 demonstrate the performance of the proposed method across varying sample sizes. The model achieved an accuracy exceeding 95.6% on both 5,000 and 10,000 samples, with the Matthews correlation coefficient stabilizing at 0.912. This indicates excellent discriminative capability and generalization performance.

Confusion matrix analysis for different sample sizes. (a) The results of the model at a scale of 5,000 samples (2,500 positives and 2,500 negatives); (b) the results of the model at a scale of 10,000 samples (5,000 positives and 5,000 negatives).

It is worth noting that the model’s recall consistently exceeds its precision, indicating a preference for minimizing the rate of missed true causative events—that is, reducing false negatives. In safety-critical surgical scenarios, this bias toward “better to err on the side of false positives than false negatives” aligns with the safety-first clinical principle. It helps ensure all critical anomalies are effectively identified and trigger appropriate response mechanisms.

Discussion

4

In high-complexity surgical scenarios such as neurosurgery and spinal surgery, confounding factors including instrument slippage, tissue deformation (6, 7), and visual field occlusion embedded in task-level video data exhibit distinct characteristics of cross-phase correlation and dynamic coupling over the temporal dimension. The non-Markovian nature shaped by these long-range causal dependencies profoundly underscores the inherent complexity of unstructured surgical environments, emerging as a key bottleneck that hinders surgical robots from achieving high-level autonomous decision-making. However, traditional temporal modeling methods are constrained by short-range dependency assumptions (8, 12), impeding their ability to effectively capture such long-range causal structures; meanwhile, existing systems based on deterministic rules or Markov Decision Processes (MDPs) fail to dynamically respond to and adaptively adjust for process disruptions induced by abnormal events, primarily due to their rigid architectures and inadequate state modeling. Ultimately, this severely compromises the reliability, robustness, and clinical applicability of these systems in real-world complex surgical settings.

Inspired by local causality discovery and non-stationary adaptive learning theories (21, 22), this study proposes a dynamic reasoning framework based on Granger causality testing. This framework translates the clinical logic of “surgical gesture–abnormal event–recovery action” into computable temporal causal hypotheses. Through vector autoregressive modeling and Granger significance testing, it achieves data-driven identification of abnormal causal chains and dynamic decision-making.

Experimental results demonstrate that the framework achieves excellent and stable performance on synthetic datasets. The model demonstrated accuracy exceeding 95.6% and a MCC value of 0.912 across both 5,000 and 10,000 sample sizes. Notably, when the sample size increased to 10,000 instances, the F1 score remained at 95.77%, attesting to its exceptional stability and robust capability to distinguish genuine causality from coincidental correlations. The core advantage of this approach lies in constructing causal chains linking “original gesture-abnormal event-recovery action” (e.g., “dropping needle → picking up needle → resuming threading”). This not only captures the direct association between “abnormal event → countermeasure” but also quantifies the dynamic influence of historical states on current decisions through the VAR model’s lagged term coefficients. Consequently, it overcomes the traditional Markov model’s reliance on “fixed temporal sequences”. Crucially, the model exhibits a persistent tendency for recall (95.88%) to exceed precision (95.34%), reflecting a cautious bias toward prioritizing “better to report a false positive than miss a true positive.” In safety-critical surgical scenarios, this design philosophy—which minimizes false negatives (i.e., missed detection of abnormal causal pairs)—aligns closely with the safety-first clinical principle. It effectively ensures that all critical anomalies (such as “dropped needles”) are identified and trigger appropriate responses, thereby significantly enhancing the system’s inherent safety. The confusion matrix further confirms that the model’s false negative rate consistently remains below 5%, demonstrating substantial application potential in surgical anomaly modeling and autonomous response.

Despite these initial advances, several limitations persist that warrant further investigation. First, model validation currently relies on synthetic data; while high performance is achieved under simplified assumptions, the framework may encounter generalization challenges in real surgical environments. Inherent complexities including continuous tissue deformation, variable instrument–tissue interactions such as context-dependent friction or adhesion, and instrument occlusion introduce unmodeled dynamic noise that is not fully captured in synthetic datasets, potentially undermining the framework’s reliability during clinical transition. To bridge the gap between simulation and actual clinical practice, we plan to collaborate with clinical institutions to collect and annotate real surgical videos across specialties such as neurosurgery and spinal surgery that include intraoperative anomalies. Establishing a high-quality dataset requires satisfying two core criteria, namely that the video library must cover an extensive spectrum of surgical scenarios to ensure robust representativeness, and that comprehensive documentation of surgical phases, potential anomalous events, and their respective mitigation strategies must be integrated. To meet these requirements, experienced surgeons will conduct systematic reviews of surgical footage to generate descriptive narratives, which will then be parsed and structured by large language models to refine classification systems for standard operative states and abnormal events. Following the formulation of rigorous annotation protocols, the labeling will be independently completed by multiple professionals, with inter-annotator consistency verified via the Kappa coefficient to guarantee the dataset’s reliability and generalizability. Furthermore, the current Vector Autoregressive (VAR) model may struggle to capture the extended temporal dependencies inherent in complex procedures. Consequently, future work will explore incorporating advanced architectures such as Transformers to enhance the modeling of long-range causal relationships and further improve decision-making robustness.

In summary, this study addresses the critical issue of long-range dependencies triggered by abnormal events in non-Markovian surgical environments through a novel causal dynamic reasoning framework. By integrating Granger causality testing with vector autoregressive models, the proposed method successfully constructs interpretable “surgical gesture-abnormal event-recovery action” chains, enabling autonomous reasoning and dynamic decision-making. The framework’s high accuracy, stability, and inherent bias toward safety—evidenced by recall consistently exceeding precision—provide a robust and clinically-aligned foundation for handling surgical disruptions. The insights and architecture presented here mark a significant step forward in propelling surgical robots from programmed execution toward cognitive, adaptive decision-making.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Long Y Lin A Kwok DHC Zhang L Yang Z Shi K . Surgical embodied intelligence for generalized task autonomy in laparoscopic robot-assisted surgery. Sci Robot. (2025) 10:eadt 3093. doi: 10.1126/scirobotics.adt 3093, 40668896 · doi ↗ · pubmed ↗
2Shademan A Decker RS Opfermann JD Leonard S Krieger A Kim PCW. Supervised autonomous robotic soft tissue surgery. Sci Transl Med. (2016) 8:337ra 64. doi: 10.1126/scitranslmed.aad 9398, 27147588 · doi ↗ · pubmed ↗
3Shakir T Atraszkiewicz D Hassouna M Pampiglione T Chand M. Beyond diagnosis: how advanced imaging technologies are shaping modern surgery. Artif Intell Surg. (2025) 5:270–82. doi: 10.20517/ais.2024.79 · doi ↗
4Kim JWB Chen JT Hansen P Shi LX Goldenberg A Schmidgall S . SRT-H: a hierarchical framework for autonomous surgery via language-conditioned imitation learning. Sci Robot. (2025) 10:eadt 5254. doi: 10.1126/scirobotics.adt 5254, 40632876 · doi ↗ · pubmed ↗
5Haworth J Biswas R Opfermann J Kam M Wang Y Pantalone D (2024). Autonomous robotic system with optical coherence tomography guidance for vascular anastomosis. ar Xiv. Available online at: https://arxiv.org/abs/2410.07493 (Accessed July 5, 2025)
6Tapper A Leale D Megahan G Nacker K Killinger K Hafron J. Robotic instrument failure—a critical analysis of cause and quality improvement strategies. Urology. (2019) 131:125–9. doi: 10.1016/j.urology.2019.02.052, 31158353 · doi ↗ · pubmed ↗
7Monji-Azad S Kinz M Kothari S Khanna R Mihan AC Männel D . Def Trans Net: a transformer-based method for non-rigid point cloud registration in the simulation of soft tissue deformation. Meas Sci Technol. (2025) 36:076006. doi: 10.1088/1361-6501/ade 613 · doi ↗
8Colledanchise M Parasuraman R Ögren P. Learning of behavior trees for autonomous agents. IEEE Trans Games. (2019) 11:183–9. doi: 10.1109/TG.2018.2816806 · doi ↗