Accelerating digital innovation in clinical neuropsychology: simulation approach to support medical device certification
Andrea Panzavolta, Federico Sternini, Paolo Caffarra, Dalila De Vita, Alessandra Dodich, Cristina Fonti, Federica L’Abbate, Luigi Lavorgna, Valentina Laganà, Camillo Marra, Costanza Papagno, Francesca Ferrari Pellegrini, Andrea Stracciari, Luigi Trojano, Tiziana Iaquinta

TL;DR
This paper introduces a simulation-based approach to validate a teleneuropsychology platform for medical device certification, aiming to accelerate digital innovation in clinical neuropsychology.
Contribution
The novel contribution is a simulation-based method for pre-clinical validation of a teleneuropsychology platform under European SaMD regulations.
Findings
Simulation-based validation achieved over 75% accuracy in representativeness according to expert evaluations.
The approach supports SaMD certification by providing credible and coherent virtual patient profiles.
The method can accelerate innovation in teleneuropsychology while ensuring safety and efficacy.
Abstract
In recent years, the focus on digitization of neuropsychological procedures in memory clinics has become paramount. Several teleneuropsychology platforms have been developed for testing patients with cognitive deficits, but only a few have been registered as medical devices (MD) being available in clinical practice. Hereby, we present a simulation-based novel approach designed to test technical performance and provide pre-clinical validation of a novel teleneuropsychology platform (i.e., Tenèpsia®) as required for the certification of Software as a Medical Device (SaMD) under the European regulation 2017/745 (MDR). Six dummy cognitive profiles simulating virtual patients with different cognitive performances were created. Five internal and two external experts evaluated simulated performances for representativeness, coherence and credibility. One cognitively unimpaired and five mild…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Dementia and Cognitive Impairment Research · Traumatic Brain Injury Research
Introduction
1
Prevention, early diagnosis and management of cognitive disorders are among the greatest global health and social care challenges of our time (1) representing a crucial priority. Cognitive assessment with harmonized neuropsychological testing is the gateway in the diagnostic roadmap for patients suspected of cognitive decline, needed for a correct definition of the syndromic presentation and to correctly address the pathway to biological diagnosis (2–4). Nonetheless, the use of harmonized cognitive testing in clinics is puzzled due to lack of time and resources, and of specific expertise both in primary care and in specialised settings (5, 6).
Digital solutions may offer the opportunity to overcome current limitations in clinical neuropsychology by measuring a broad range of cognitive abilities remotely and with relevant time-saving and reduced resources. Although the pandemic has amplified the offer of telemedicine services, harmonized and clinically validated teleneuropsychology batteries on digital solutions are still lacking. Crucially, there are too few software tools certified for this purpose in Europe [see for example Lesoil et al. (7)]. One of the main obstacles is indeed obtaining the software as medical device (SaMD) certification according to the European regulation 2017/745 (MDR). The procedure requires to prove that the risk-benefit profile of the device is non-inferior to the currently available state of the art. This may be done by performing a comparison with the standard of care. However, this approach is often expensive and time-consuming. Performance simulations are thus increasingly used in digital health to prove SaMD reliability, reproducibility, and safety prior to clinical validation, ensuring that simulated performance does not materially diverge from real-world performance, accelerating digital innovation and providing better resilience to privacy attack and a statistical validity comparable to the traditional approach (8).
In this study, we describe the methodological approach adopted for the analytical/technical and pre-clinical validation of a novel teleneuropsychology platform [i.e., Tenèpsia® (9)], already tested for usability and user-friendless (10), to prove its technical performance and suitability for EU certification under the 2017/745 MDR.
Materials and methods
2
Creation of dummy users, simulation and evaluation of performance
2.1
In the first phase, we created dummy cognitive profiles and simulated performances of potential users of the digital platform to verify that the system operates as intended, before exposure to real patients or acquisition of clinical data and according to the early-stage pre-clinical requirements of SaMD. Five internal experts from the Tenèpsia® working group (A.P., C.C., C.M., P.C., & S.C.) met twice a week for two months in a focus group to define dummy user profiles. These meetings involved both clinicians and software engineers, with the aim of jointly defining the expected cognitive patterns and translating them into simulated test scores based on literature and normative datasets. Since the Tenèpsia® platform was designed to test individuals with suspected mild cognitive impairment (MCI) (9), the relevant literature was considered as reference evidence [see Petersen et al. (11) for a review]. Average age of dummy users was derived from literature evidence (12) and average education from the last report of Italian National Institute of Statistics [i.e., ISTAT (13)]. Simulated cognitive performances of dummy users were defined based on Italian normative data and error profiles reported in literature (14–26). Mean and standard deviation for each neuropsychological test included in the platform (9) were considered as reference standards. In accordance with the international guidelines for neuropsychological testing (27), impaired scores in MCI were simulated by considering a performance deviating two standard deviations from the mean score of cognitive unimpaired subjects. For those digital tests in which the number of items differed from the paper-and-pencil version [see Panzavolta et al. (9), for details], mean and standard deviation scores were normalized. After the definition of demographic features and simulated test scores of dummy profiles, the scenarios were run on the Tenèpsia® platform through automated scripts developed by the engineering team based on the target cognitive profiles identified by clinical experts. These scripts interacted with the platform as real users (i.e., by generating plausible responses for each test on the platform). Simulated performances were thus directly executed by the software enabling to verify the correct generation of the expected cognitive outputs on the platform.
In the second phase, dummy user profiles were evaluated for representativeness, defined here as the extent to which the simulations accurately reflect user performances in a real-world context, coherence and credibility of the simulated cognitive profiles. Two experts in clinical neuropsychology external from the Tenèpsia® working group were involved for the evaluation phase to ensure the reliability of the simulations and to meet the standards required for EU SaMD certification (28). Both internal and external experts completed the evaluations for each dummy user. Data were collected through an ad hoc questionnaire implemented on Google Forms® (see Supplementary Table S1). Percentage on accuracy in representativeness and average credibility and coherence scores were computed. Upon conclusion of the evaluation of each dummy user, A.P. and F.S. conducted qualitative interviews to understand the reasons for the choices made through the questionnaire.
Results
3
Dummy user profiles
3.1
At the end of the focus group activities, internal experts agreed on the definition of six dummy users: one with a normal cognitive profile and five with different MCI profiles. This final set of six dummy users was selected to ensure a full coverage of the cognitive profiles relevant for the intended use of the teleneuropsychology platform (i.e., the early detection of mild neurocognitive disorder, namely MCI) (11). Since our goal was the technical validation of the platform as required for MDR pre-clinical assessment, we included the minimum core set of cognitive profiles more frequently reported in literature (11) to test the platform across distinct MCI patterns without redundancy and ensuring feasibility for expert-based evaluations. The MCI dummy user profiles included: one amnestic single-domain MCI (aMCI-sd) with long-term episodic memory impairments only, three amnestic multi-domain MCI (aMCI-md) with memory plus executive/attention or language or visuo-spatial impairments and one non-amnestic single-domain (naMCI-sd) with executive/attention impairments only. Literature search identified an average age of 73.4 years for MCI individuals. According to this average age, the reference education level identified by the 2020 ISTAT report was the degree of primary school (i.e., 5 years). Reference normative data for each test included in the Tenèpsia® platform were thus selected considering means and standard deviations of individuals of an age of 73 years and ≤5 years of education (see Supplementary Table S2 for simulated test scores).
Representativeness, coherence and credibility
3.2
The accuracy in correctly identifying the representativeness of dummy users’ profiles was above 75% for both internal and external experts (Figure 1). At the post-test qualitative interviews, experts reported some challenges in the classification due to the different options provided by the questionnaire. Coherence and credibility of dummy user profiles was overall good (i.e., scores ≥4 at the 7-point Likert questionnaire) for both internal and external experts (Figure 2). Post-test qualitative interviews revealed that the slightly lower credibility scores were due to low trustiness in simulations, being the experts aware of the fact that they were asked to judge dummy users.
Percentage accuracy in correctly identifying the representativeness of each dummy user profile. The y-axis reports the overall accuracy (%) in recognizing the cognitive profile of each dummy user. The x-axis shows the different dummy user profiles: 1° aMCI-sd, single-domain amnestic Mild Cognitive Impairment; 2° aMCI-md, multi-domain amnestic Mild Cognitive Impairment, memory plus executive/attention; 3° naMCI-sd, single-domain non-amnestic Mild Cognitive Impairment, executive/attention; 4° Normal, cognitively unimpaired profile; 5° naMCI-md, multi-domain non-amnestic Mild Cognitive Impairment, memory plus language; 6°naMCI-md, multi-domain non-amnestic Mild Cognitive Impairment, memory plus visuo-spatial. Accuracy values are reported separately for internal experts and external experts. The horizontal dashed line indicates the predefined acceptability threshold (75%).
Average coherence and credibility scores for each dummy user profile. The y-axis displays the dummy user profiles: 1° aMCI-sd, single-domain amnestic Mild Cognitive Impairment; 2° aMCI-md, multi-domain amnestic Mild Cognitive Impairment, memory plus executive/attention; 3° naMCI-sd, single-domain non-amnestic Mild Cognitive Impairment, executive/attention; 4° Normal, cognitively unimpaired profile; 5° naMCI-md, multi-domain non-amnestic Mild Cognitive Impairment, memory plus language; 6°naMCI-md, multi-domain non-amnestic Mild Cognitive Impairment, memory plus visuo-spatial. The x-axis reports the mean coherence and credibility scores, assessed using a 7-point Likert scale, ranging from 0 (very low) to 7 (very high). Scores represent the average ratings provided by internal and external experts for each dummy user profile.
Discussion
4
In this study, we describe a simulation approach for the validation of a teleneuropsychology software as SaMD developed for remote cognitive testing in memory clinics [i.e., Tenepsia® (9)]. The use of such approaches is growing in interest in the digital health field (29). As proved by a recent consensus study involving regulators, industry and academic experts (29), this well-recognized step within the MDR 2017/745 pathway is a valid method to provide early-stage evidence of SaMD technical performance and support pre-clinical assessment before the authorization of clinical investigations and real-patient testing. In teleneuropsychology, it may allow the verification of scoring logic, consistency of software performance metrics with real-world use, and functional robustness under controlled conditions, without exposing patients to risks of an uncertified MD (28). Therefore, such preliminary MDR approaches enrich, rather than replace, clinical validation by demonstrating that the system can reproduce and discriminate heterogeneous cognitive profiles.
Our results indicate that the scores simulated in the platform are largely consistent with profiles of potential patients in real-world scenarios. The use of simulations with dummy users provided indeed good accuracy, and coherent and credible simulated cognitive profiles supporting the validation of the platform. Experts confirmed the adequacy of the work done in terms of simulation, through the accurate identification of profiles and their positive evaluation. The digital tests integrated into the platform effectively provide comparable information with their paper-and-pencil counterparts, ensuring equivalent results. Simulated avatars have been equally validated by internal experts from the study group and external clinical experts not involved in the project, demonstrating good consistency and credibility in both groups.
This approach have many advantages in teleneuropsychology for both final users, i.e., patients and clinicians. First, it avoids ethical concerns related to real-life testing of the digital solution on cognitive patients, such as the potential psychological burden, discomfort, or stress that the neuropsychological assessment may cause, especially in vulnerable populations like individuals with dementia or cognitive decline (8). Additionally, it avoids exposing human participants to unproven technologies, thus minimizing the risks associated with early-stage testing. The simulated approach also offers a time-saving and controlled environment for technical testing, free from the unpredictable factors that can arise in clinical settings. This is especially relevant in dementia, where cognitive deficits can impact reliability of collected data (30). One of the main challenges in testing novel digital solutions is the variability in patient responses and clinical settings. Dummy patients ensure a reproducible and standardized setting environment, which enhances the reliability of the data collected. This consistency is crucial for regulatory bodies when assessing the validity and safety of a new device. In addition, synthetic data are generated reducing expenses related to data acquisition, storage, and cleaning and enable the creation of reliable datasets that preserve statistical characteristics without compromising sensitive information. Finally, the controlled environment of simulated approaches also provides a consistent baseline for testing different MDs, allowing to pinpoint and rectify problems more effectively before moving on to real-life trials. This accelerated process can lead to quicker innovations and faster market entry for new digital devices, ultimately benefiting both patients and clinicians. Moreover, the availability of certified digital platforms may also have a significant impact in the neurorehabilitation setting, enabling continuous cognitive monitoring and the personalization of remote cognitive training programs, which are particularly valuable in long-term care pathways.
Some limitations of the current study should be acknowledged. According to the feedback provided by the experts, the lack of qualitative information on medical history and of observation of patient's attitude during the neuropsychological examination may affect the credibility of simulated performances. Previous training on questionnaire evaluation is also to recommend to improve the reliability of experts’ evaluations. Future validation studies are also needed to evaluate the consistency of our anecdotal findings, address diagnostic comparability across different administration settings (hospital- vs. home-based), and test usability in the target population.
In conclusion, simulation approaches represent a promising avenue for ensuring acceleration of digital innovation in clinical neuropsychology saving costs of direct comparison with real-life standard of care. While direct comparison with the standard of care remains a critical component of MD validation, simulated approaches with dummy users provide a complementary pathway addressing ethical, financial, and logistical challenges. This approach may also serve as valuable tool for training and education. Neuropsychologists can use simulations to become familiar with new devices and procedures, ensuring a smoother transition from development to clinical practice. This dual benefit enhances both the validation process and the practical implementation of new technologies in teleneuropsychology. Digital platforms for diagnosing and monitoring cognitive disorders will certainly contribute to a transition towards a more sustainable and equitable future, especially for the most fragile segments of the population.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1World Health Organization. Global status report on the public health response to dementia. Available online at: https://www.who.int/publications-detail-redirect/9789240033245 (Accessed July 4, 2023).
- 2Sachdev PS Blacker D Blazer DG Ganguli M Jeste DV Paulsen JS Classifying neurocognitive disorders: the DSM-5 approach. Nat Rev Neurol. (2014) 10:634–42. 10.1038/nrneurol.2014.18125266297 · doi ↗ · pubmed ↗
- 3Frisoni GB Boccardi M Barkhof F Blennow K Cappa S Chiotis K Strategic roadmap for an early diagnosis of Alzheimer’s disease based on biomarkers. Lancet Neurol. (2017) 16:661–76. 10.1016/S 1474-4422(17)30159-X 28721928 · doi ↗ · pubmed ↗
- 4Boccardi M Monsch AU Ferrari C Altomare D Berres M Bos I Harmonizing neuropsychological assessment for mild neurocognitive disorders in Europe. Alzheimer’s Dementia. (2022) 18:29–42. 10.1002/alz.12365 PMC 964285733984176 · doi ↗ · pubmed ↗
- 5Bradford A Kunik ME Schulz P Williams SP Singh H. Missed and delayed diagnosis of dementia in primary care: prevalence and contributing factors. Alzheimer Dis Assoc Disord. (2009) 23:306–14. 10.1097/WAD.0b 013e 3181 a 6bebc 19568149 PMC 2787842 · doi ↗ · pubmed ↗
- 6de Levante Raphael D. The knowledge and attitudes of primary care and the barriers to early detection and diagnosis of Alzheimer’s disease. Medicina (Kaunas). (2022) 58:906. 10.3390/medicina 5807090635888625 PMC 9320284 · doi ↗ · pubmed ↗
- 7Lesoil C Bombois S Guinebretiere O Houot M Bahrami M Levy M Validation study of “Santé-Cerveau”, a digital tool for early cognitive changes identification. Alz Res Therapy. (2023) 15:70. 10.1186/s 13195-023-01204-x PMC 1006872937013590 · doi ↗ · pubmed ↗
- 8Murtaza H Ahmed M Khan NF Murtaza G Zafar S Bano A. Synthetic data generation: state of the art in health care domain. Comput Sci Rev. (2023) 48:100546. 10.1016/j.cosrev.2023.100546 · doi ↗
