The AENEAS Project: Intraoperative Anatomical Guidance Through Real-Time Landmark Detection Using Machine Vision
Simone Olei, Gary Sarwin, Victor E. Staartjes, Luca Zanuttini, Seungjun Ryu, Luca Regli, Ender Konukoglu, Carlo Serra

TL;DR
This paper explores using machine vision to detect anatomical landmarks during complex brain surgery, showing promising results for deep structures like the optic nerve.
Contribution
The study introduces a deep learning model for real-time anatomical landmark detection in microsurgery, validated with a large dataset of surgical videos.
Findings
The model achieved high precision for detecting deep structures like the optic nerve (AP50: 0.73) and internal carotid artery (AP50: 0.67).
Superficial structures had lower precision due to morphological similarity and optical variability.
Performance variability highlights the complexity of the anatomical setting and data limitations.
Abstract
To investigate the performance of a deep learning machine vision-based model in identifying anatomical landmarks in a complex microsurgical setting, such as the pterional trans-Sylvian approach. We developed a deep learning object detection model (YOLOv7x) trained on 5307 labeled frames from 78 surgical videos of 76 patients undergoing pterional trans-Sylvian approach from January 1, 2020 to June 31, 2024. Surgical steps were standardized, and key anatomical targets—frontal/temporal dura, inferior frontal/superior temporal gyri, optic and olfactory nerves, and internal carotid artery—were annotated by specifically trained neurosurgical residents and verified by the operating surgeon. Bounding boxes derived from segmentation masks served as training inputs. Performance was evaluated using 5-fold cross-validation. The model achieved promising detection performance for deep structures,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoft Robotics and Applications · Surgical Simulation and Training · Medical Imaging and Analysis
The success of any surgical operation depends on 2 factors: a cognitive factor, that is, the operator’s ability to know what should be done, according to a sort of mental “roadmap,” and a technical factor, that is, the ability to perform what the operator sets out to do. The technical factor is a consequence of the operator's dexterity, improvable through repeated training and ideally, at least theoretically, replaceable or improvable in a future robotic context. The cognitive factor, that is, the ability to move according to a correct mental roadmap, on the contrary, is more complex and is a consequence of a long learning curve. This learning curve is highly individual, difficult to standardize and is probably the main cause behind the high interindividual variability among surgeons. The same surgical procedure can be routine in the hands of one surgeon and extremely risky in the hands of another, depending on the operator’s experience (ie cognitive factor) with the pathology, the technique, and the surgical anatomy involved. Standardization of this cognitive process, obviously in an ameliorative sense, would allow, if achieved to considerably improve the level of surgery on a global scale.
The construction of the mental roadmap recognizes several steps, of which the first is unquestionably the perfect knowledge of normal neuroanatomy and the exact understanding of pathological anatomy in general and of the individual patient in particular. It cannot be considered a coincidence that most of the technological developments introduced into the neurosurgical routine since the very beginning of this discipline (microscope, endoscope, neuronavigation, intraoperative ultrasound, electrophysiological monitoring, and intraoperative magnetic resonance imaging)1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 have correct recognition of the intraoperative anatomy and discernment of normal and pathological anatomy as their ultimate goal. The recent introduction of artificial intelligence methods in medicine, pertaining particularly to the field of so-called machine vision, opens up further possibilities for improving what is already available. In a previous work we have already reported how, by using a deep-learning-based object detection method (YOLO) applied to a large database of surgical videos related to endoscopic transsphenoidal surgery (TSS), it is possible to create an algorithm capable of recognizing normal anatomy visible in the nasal and sphenoidal phases of the TSS itself.15^,^16 The implications are obvious and promising but need further investigation and validation to fully understand its potential.
First and foremost, the question is whether the results obtained from modeling a highly repetitive and standardized surgical approach such as the TSS, where the surgical steps are repeated almost always in the same sequence, under superimposable optical conditions, with visualization of clearly distinguishable anatomical structures, are also obtainable for other neurosurgical approaches, particularly the intracranial ones.
Therefore, the purpose of this study is to evaluate (a) whether it is possible to create a machine vision algorithm that can also recognize normal intracranial brain anatomy, and (b) to evaluate its performance. To this end, we chose to test this methodological approach on a database of videos of pterional trans-Sylvian (PTS) approaches, ie a commonly used neurosurgical procedure whose steps are known and all in all fairly repetitive but which presents greater variability in the sequence of these, in the morphology of the visualized anatomical structures, and above all in the optical conditions of visualization (brightness/depth/viewing angle).
Patients And Methods
Data Acquisition, Surgical Technique, and Labeling
In this study, we trained and validated a deep learning algorithm capable of detecting key anatomical structures encountered along a standard PTS approach. We extracted videos of PTS approaches from our institutional surgical video database, prospectively collecting videos of all operations from January 1, 2020, to June 31, 2024. Operations were performed by one neurosurgeon (C.S.) according to the common microneurosurgical tenets. Patients’ heads were variably rotated and tilted according to the pathology. After standard pterional craniotomy with drilling of the ala minor down to the lateral edge of the superior orbital fissure, the dura was incised and reflected anteriorly to expose the Sylvian fissure and related superior temporal gyrus (STG) and inferior frontal gyrus (IFG). Under the surgical microscope the basal cisterns were reached subfrontally, and the carotid as well as chiasmatic cisterns were opened thereby, invariably exposing the internal carotid artery (ICA), the optic nerve (II), and the olfactory nerve (I) in all surgical videos.
Surgical video frames were labeled with a dedicated software (VoTT) by specifically trained neurosurgical residents (S.O. and L.Z.) and checked according to an internal quality check protocol by the operating neurosurgeon. Both right-sided and left-sided cases were included. Labeling was done into 7 different classes were labeled (see also Table 1): superficial targets included the temporal dura (TD) mater, the frontal dura (FD) mater, the IFG, and the STG. The deep targets included I, II, and ICA. Given the close reciprocal proximity of some of the targets encountered in this approach, the labeling procedure was performed by coarse contour segmentation. The total amount of labels per anatomical class is shown in Table 1. Visual examples of labels are shown in Figure 1. The use of patient data from the surgical registry was approved by the local ethical review boards (KEK 2023-02265). All patients signed research consent forms.Table 1. Number of Labels per ClassTDFDIFGSTGIIIICA# Labels18592715112984324623071594FD, frontal dura; ICA, internal carotid artery; IFG, inferior frontal gyrus; I, olfactory nerve; STG, superior temporal gyrus; TD, temporal dura; II, optic nerve.Figure 1. Manual labeling of anatomical targets. The anatomical segmentation and labeling of different structures are represented in this picture. (A) shows an illustrative right-sided case, where the FD mater, TD mater, IFG, and STG are visualized with the surgical microscope. In (B), the same structures are labeled with different colors using the VoTT software: FD mater is labeled with green, TD mater with red, IFG with blue, and STG with light blue. (C) shows an illustrative left-sided case, where the ICA and optic nerve (II) are visualized. In (D), the same structures are labeled: ICA with pink and II with blue. FD, frontal dura; ICA, internal carotid artery; IFG, inferior frontal gyrus; PTS, pterional trans-Sylvian; STG, superior temporal gyrus; TD, temporal dura.
Model Development and Evaluation
As a first proof of concept, the chosen task was object detection, a problem simpler than segmentation, and therefore the segmentation masks were transformed into bounding boxes. For this task the YOLOv7x model was used. This model was trained using the standard configuration as reported in Wang et al17 for 300 epochs; the configuration file is provided as Supplemental Material (available online at https://www.mcpdigitalhealth.org/). To evaluate performance, the dataset was randomly divided into 5 folds with an equal number of patients, ensuring that all data from a given patient were included in only one-fold. Then 5 different configurations of training, validation, and test data, were used to train the network and generate object detection results. For each configuration a different fold was used as the test set, and the resulting 3 and 1-folds for training and validation, respectively. The average precision at an intersection over union threshold of 0.50 (AP_50_) results at an intersection-over-union threshold of 0.5 (AP_50_) for the different configurations are reported in the results section below and shown in Table 2.Table 2AP_50_, Precision and Recall Results for 5-Fold Cross-ValidationClassAP_50_ (mean ± SD [max-min])Precision (mean ± SD [max-min])Recall (mean ± SD [max-min])Mean0.39 ± 0.06 (0.46-0.31)0.45 ± 0.07 (0.50-0.34)0.42 ± 0.04 (0.49-0.38)II0.73 ± 0.10 (0.85-0.61)0.62 ± 0.11 (0.79-0.52)0.75 ± 0.09 (0.88-0.63)ICA0.67 ± 0.10 (0.80-0.58)0.66 ± 0.06 (0.70-0.55)0.68 ± 0.10 (0.80-0.60)FD0.45 ± 0.11 (0.61-0.35)0.51 ± 0.07 (0.59-0.44)0.51 ± 0.14 (0.67-0.29)STG0.33 ± 0.20 (0.57-0.20)0.48 ± 0.20 (0.70-0.16)0.30 ± 0.17 (0.49-0.12)IFG0.25 ± 0.12 (0.42-0.14)0.37 ± 0.06 (0.47-0.32)0.27 ± 0.16 (0.52-0.11)TD0.25 ± 0.11 (0.42-0.13)0.38 ± 0.09 (0.50-0.28)0.35 ± 0.10 (0.47-0.25)I0.05 ± 0.04 (0.12-0.02)0.13 ± 0.08 (0.24-0.02)0.1 ± 0.07 (0.21-0.05)AP_50_, average precision at an intersection over union threshold of 0.50; FD, frontal dura; ICA, internal carotid artery; IFG, inferior frontal gyrus; max, maximum; min, minimum; I, olfactory nerve; SD, standard deviation; STG, superior temporal gyrus; TD, temporal dura; II, optic nerve.
Results
Data from a total of 78 surgical videos from 76 patients (due to reoperation in recurrent cases) undergoing PTS approach were collected for a total of 5307 labeled images. Surgical indications were as follows: frontal/temporal/insular gliomas and metastases, aneurysms of the anterior circulation, craniopharyngiomas, tuberculum sellae meningiomas, tumors of the limbic system, hypothalamic tumors, meningiomas of the sphenoid wing, olfactory meningiomas, anterior clinoid meningiomas, suprasellar arachnoid cysts, third-ventricular masses, meningiomas of the planum sphenoidalis, cavernous sinus meningioma, putaminal germinoma, lymphoplasma-histioproliferative mass of the optic canal, interpeduncular chordoma, hemangiopericytoma of the middle cranial fossa, and septal mass.
The achieved mean AP_50_ (see Table 2) was 0.39: values were lower in case of I and in case of all superficial structures (TD mater, FD mater, IFG, and STG). Interestingly, however, for the remaining deep structures, the II and ICA, better AP_50_ values could be reached: 0.73 (range 0.61-0.85) in case of the II and 0.67 (range 0.80-0.58) in the case of the ICA. Qualitative test results are visualized in Figure 2.Figure 2. Automatic anatomical recognition. The model displays various confidence scores for superficial and deep targets. (A) shows a right-sided illustrative case of automatic recognition through bounding boxes around FD mater and TD mater, with confidence scores of 0.47 and 0.31, respectively. (B) shows a left-sided illustrative case, where bounding boxes tightly surround the optic nerve (II) and ICA, displaying considerably higher confidence scores of 0.85 and 0.79, respectively. In this case, anatomical recognition is more precise. The confidence score is the product of the objectness probability (how likely a box contains an object) and the highest-class probability, both predicted by the model for each bounding box. FD, frontal dura; TD, temporal dura; ICA, internal carotid artery; I, olfactory nerve.
The model displays various confidence scores for superficial and deep targets. Figure 2A shows a right-sided illustrative case of automatic recognition through bounding boxes around FD mater and TD mater, with confidence scores of 0.47 and 0.31, respectively. Figure 2B shows a left-sided illustrative case, where bounding boxes tightly surround II and ICA, displaying considerably higher confidence scores of 0.85 and 0.79, respectively. In this case, anatomical recognition is significantly more precise. The confidence score is the product of the objectness probability (how likely a box contains an object) and the highest-class probability, both predicted by the model for each bounding box.
The confusion matrix in Figure 3 summarizes the model’s performance across the 7 anatomical targets and the background class. Overall detection results are promising, with limited confusion between anatomically distinct structures. Most misclassifications occur among visually similar superficial regions, such as the TD mater and FD mater or between the IFG and STG. Because the microscopic surgical scene evolves slowly rather than changing rapidly, detecting every frame is less critical than maintaining low interclass confusion, which the model achieves effectively.Figure 3. Confusion matrix. The confusion matrix in Figure 3 summarizes the model’s performance across the 7 anatomical targets and the background class. Overall detection results are promising, with limited confusion between anatomically distinct structures. Most misclassifications occur among visually similar superficial regions, such as the TD mater and FD mater or between the IFG and STG. Because the microscopic surgical scene evolves slowly rather than changing rapidly, detecting every frame is less critical than maintaining low interclass confusion, which the model achieves effectively. FD, frontal dura; FN, false negative; FP, false positive; ICA, internal carotid artery; IFG, inferior frontal gyrus; I, olfactory nerve; STG, superior temporal gyrus; TD, temporal dura; II, optic nerve.
Finally, a video is included showing the detection results on a test video. (see Supplemental Video, available online at https://www.mcpdigitalhealth.org/)
Discussion
In this preliminary study, we reported how a machine vision approach can be applied to the identification of intracranial anatomical structures visible during a PTS approach. The novelty from a machine vision point of view is determined by the fact that the PTS approach presents, at least from a theoretical point of view, modeling difficulties that make it potentially much more difficult to analyze than the TSS endoscopic approach. However, confirmation that the same approach we have illustrated in previous publications15^,^16^,^18 can also be used in the field of microscopic intracranial neurosurgery opens up very interesting prospects for the use of machine vision techniques, and therefore artificial intelligence in general, in improving neurosurgery by perfecting intraoperative anatomical recognition.
Machine Vision and Neurosurgery
The use of computer vision in neurosurgery is a relatively recent development, with algorithms being developed mainly for surgical skill assessment, workflow optimization, segmenting instruments, and assess the surgeon’s hand movement in the operating room.19, 20, 21 Other applications of computer vision include surgical instrument tracking, with preliminary efforts addressing the real-time identification of events like brain retraction or blood loss.22, 23, 24 More recent is the development of algorithms capable of identifying anatomical structures visible during endoscopic TSS, as indicated also by our research group.15^,^16 However, the endoscopic TSS approach has some specific features that make it unique in its own way. The operative field is essentially a linear corridor from the nostril to the floor of the sella through the sphenoid sinus. The anatomical structures always appear in the same sequence and almost invariably in the same 3-dimensional topographical relationships with respect to the endoscope, ie from the operator’s point of view. The endoscope itself provides a focused, high-definition visualization of the entire space being framed, and the image is transmitted in digital form to a screen for indirect viewing.13^,^14
From a machine vision perspective, intracranial microneurosurgery approaches differ from transnasal trans-sphenoidal endoscopy in 2 particular aspects: first, the method of acquiring and viewing images is different due to the use of a microscope instead of an endoscope. The optical characteristics of the surgical microscope impose a shallow depth of field and reduced illumination at deeper focal planes. As a result, only narrow regions are in sharp focus, while surrounding structures appear blurred or underexposed, making it difficult for models to consistently extract relevant features. Instrument occlusions in this already narrow field of view increase the complexity even further.14
Second, the object being viewed, ie the surgical field, is much more variable in terms of both the anatomical structures viewed (number of, morphological aspect, etc.) and their reciprocal topographical relationships due to the greater range and freedom of movement of the microscope relative to the focal point. Microscope-assisted surgery involves frequent changes in camera angle, magnification, and position, leading to inconsistent perspectives across frames. The surgeon, and therefore also the microscope, are positioned at the head of the patient, but their position, and therefore the viewing angle of the same anatomical structure, can change continuously, assuming virtually any position along the surface of the hemisphere centered on the structure under observation. To further illustrate this point, the visualized spatial relationships displayed between the ICA and the ipsilateral optic nerve or between the splenium and the Vena Galeni may radically change depending on the position of the microscope, whereas such radical shifts do not occur in TSS approaches.25 Therefore, the model must learn to recognize anatomical structures across a much wider range of viewpoints and spatial configurations.
Machine Vision and PTS Approach
In moving from a highly reproducible and controlled setting such as endoscopic TSS to the more variable setting of intracranial microneurosurgery, we chose to start with the PTS approach because, of all the intracranial approaches, it is probably the most codified and with the least extreme variability in relative positioning between the microscope and the surgical anatomy, at least in its initial steps. The aim was, in a sense, to subject our machine vision model to a kind of stress test with more gradual changes in the key variables.
Although in this new context the machine vision system is confronted with more complex and variable conditions, the results obtained in the detection of the 7 anatomical targets considered are optimistic. In this proof-of-concept study, the application of a machine vision approach to 78 surgical videos from 76 patients undergoing surgery through a PTS approach for different indications, allows without a priori anatomical knowledge to identify key anatomical targets, the optic nerve and ICA, with promising accuracy. This represents a notable step forward from earlier studies from our group, showing how the same machine vision model can manage not only anatomical recognition in a relatively simple and linear surgical scenario, such as pituitary surgery, but also in a more complex setting, such as the PTS approach.26
Still, AP_50_ for some more superficially located structures (TD mater, FD mater, IFG, and STG) was less satisfying. This can be attributed to several factors. On the contrary, the setting itself is more complex, as previously discussed. Second, the number of labeled images used in this experiment is approximately one-quarter of those available for the TSS approach. In addition, the dataset presents a degree of class imbalance, particularly for underrepresented structures such as the olfactory nerve, which likely contributed to variability in performance across targets. A third reason may be attributable to intrinsic limitations of the YOLO algorithm in clearly distinguishing certain anatomical structures from others. The fact that the algorithm has difficulty distinguishing the TD mater from the FD mater and the STG from the IFG could suggest a difficulty in recognizing structures that are visually similar from a morphological point of view and whose discrimination is based exclusively on topographical criteria. In other words, while the ICA and optic nerve are clearly distinguishable not only by their relative position in space but also by their visual appearance, the same cannot be said when differentiating between 2 cerebral convolutions. If the latter hypothesis were true, it would be necessary to either use other object recognition algorithms or change the input methods for the algorithm itself. Alternatively, the question arises as to whether the poor AP_50_ in structure recognition can be resolved simply by increasing the number of labeled frames. In this study, given the complex setting in which the algorithm was applied, considering the different orientation and depth of the anatomic structures encountered throughout the operations and the small dimensions and spatial density of the deep cisternal structures, the labeling procedure on the surgical images was performed not as bounding boxes or single point annotations, but as contour segmentation. This kind of annotation procedure is time-intensive and requires attention to anatomical detail, especially when contiguous targets are considered (eg the optic nerve and the carotid artery), but appears to minimize inaccuracies due to similarity or possible overlapping between structures.
Limitations and Future Perspectives
The model showed promising performance in detecting the requested anatomy in the test set, particularly the optic nerve and the ICA in a common transcranial neurosurgical approach such as the PTS approach, representing nonetheless a complex setting due to the many degrees of freedom of the camera and operative factors. A limitation of this preliminary work is that all data were acquired from a single institution using a uniform imaging pipeline, which may limit the generalizability of the results. External validation on multicenter datasets with heterogeneous imaging conditions is our future research to assess the robustness and reproducibility of the model.
Being this study a proof of concept only, the number of structures tested is limited so that they represent key landmarks for a hypothetic future pterional roadmap at different depths in the surgical field. An actual roadmap for such an approach would require mapping a greater number of targets, specifically in the superficial Sylvian fissure splitting phase and in deeper cisternal dissection. For the same reason, only normal anatomy has been tested: recognition of pathological targets, such as tumors or aneurysms, is conceivable as a future step. Moreover, anatomic variants may further complicate the performance.
From a practical standpoint, this study was designed as a proof of concept focused on the feasibility of anatomical recognition rather than clinical implementation. Quantitative assessment of clinical utility parameters such as latency and display integration is currently ongoing and will be reported separately once finalized. Preliminary results suggest that real-time performance is technically feasible with contemporary architectures.
Depending on how these aspects are addressed, it is conceivable that machine vision-based algorithms could in the future be implemented in surgical practice as tools to assist intraoperative anatomical recognition. Whether such systems will eventually evolve toward autonomous intraoperative guidance or remain as complementary aids to existing neuronavigation platforms remains an open question.
Conclusion
With this study, we have reported in a proof-of-concept setting the possibility of training machine-vision-based algorithms capable of automatically recognizing anatomical structures visible during the microsurgical trans-Sylvian pterional approach. Notably, for 2 out of the 3 deeper structures (optic nerve and ICA), a very satisfactory AP_50_ score could be achieved. The results of our study also show that the accuracy of automated anatomical recognition, expressed in AP_50_, is not equally good for all anatomical structures, thus possibly highlighting the importance of the number of labels per class. Nevertheless, they represent an indispensable first step toward machine vision-guided surgical roadmaps supporting intraoperative orientation in intracranial microneurosurgery.
Potential Competing Interests
The authors report no competing interests.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Orringer D.A.Golby A.Jolesz F.Neuronavigation in the surgical management of brain tumors: current and future trends Expert Rev Med Devices 95201249150010.1586/erd.12.4223116076 PMC 3563325 · doi ↗ · pubmed ↗
- 2Senft C.Bink A.Franz K.Vatter H.Gasser T.Seifert V.Intraoperative MRI guidance and extent of resection in glioma surgery: a randomised, controlled trial Lancet Oncol 12112011997100310.1016/S 1470-2045(11)70196-621868284 · doi ↗ · pubmed ↗
- 3Staartjes V.E.Togni-Pogliorini A.Stumpo V.Serra C.Regli L.Impact of intraoperative magnetic resonance imaging on gross total resection, extent of resection, and residual tumor volume in pituitary surgery: systematic review and meta-analysis Pituitary 244202164465610.1007/s 11102-021-01147-233945115 PMC 8270798 · doi ↗ · pubmed ↗
- 4Serra C.Burkhardt J.K.Esposito G.Pituitary surgery and volumetric assessment of extent of resection: a paradigm shift in the use of intraoperative magnetic resonance imaging Neurosurg Focus 4032016 E 1710.3171/2015.12.FOCUS 1556426926057 · doi ↗ · pubmed ↗
- 5Hammoud M.A.Ligon B.L.Elsouki R.Shi W.M.Schomer D.F.Sawaya R.Use of intraoperative ultrasound for localizing tumors and determining the extent of resection: a comparative study with magnetic resonance imaging J Neurosurg 845199673774110.3171/jns.1996.84.5.07378622145 · doi ↗ · pubmed ↗
- 6Burkhardt J.K.Serra C.Neidert M.C.High-frequency intra-operative ultrasound-guided surgery of superficial intra-cerebral lesions via a single-burr-hole approach Ultrasound Med Biol 40720141469147510.1016/j.ultrasmedbio.2014.01.02424680295 · doi ↗ · pubmed ↗
- 7Ulrich N.H.Burkhardt J.K.Serra C.Bernays R.L.Bozinov O.Resection of pediatric intracerebral tumors with the aid of intraoperative real-time 3-D ultrasound Childs Nerv Syst 281201210110910.1007/s 00381-011-1571-121927834 · doi ↗ · pubmed ↗
- 8Stummer W.Stepp H.Wiestler O.D.Pichlmeier U.Randomized, prospective double-blinded study comparing 3 different doses of 5-aminolevulinic acid for fluorescence-guided resections of malignant gliomas Neurosurgery 812201723023910.1093/neuros/nyx 07428379547 PMC 5808499 · doi ↗ · pubmed ↗
