Using the predictive model of difficult endotracheal intubation to examine different simulators for airway management training: a pilot cross-sectional observational study
Ching-Hsiang Yu, En-Chih Liao, Yat-Pang Chau, Ming-Kun Huang, Ching-Yi Shen, Ding-Kuo Chien

TL;DR
This study evaluates how well a scoring system predicts the difficulty of endotracheal intubation training using different manikins.
Contribution
A novel scoring system is proposed to objectively assess the difficulty of airway management training simulators.
Findings
Model C had the highest difficulty score, longest intubation time, and lowest success rate.
Subsequent intubation attempts showed reduced time and complexity compared to the first attempt.
The scoring system aligned well with actual training outcomes, supporting its potential use for simulator evaluation.
Abstract
In recent years, Taiwan’s medical education has increasingly emphasized simulated learning, particularly through advanced manikins designed for procedural training, including endotracheal intubation. Although key indicators and predictive techniques for assessing complexity have been documented, their use in evaluating these manikins remains notably lacking.The aim of this study was to appraise the potential association between our devised scoring system and the actual outcome of intubation procedures. Subsequently, this scoring system could potentially serve as an objective yardstick for quantifying the intricacy of training simulators. Nineteen post-graduate or emergency medicine trainees participated in this study. Intubation training involved four manikins, each with varying difficulty scores based on neck circumference, thyromental distance, airway obstruction, and Mallampati…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —Mackay Medical College
- —Mackay Memorial Hospital
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAirway Management and Intubation Techniques · Simulation-Based Education in Healthcare · Tracheal and airway disorders
Background
Ensuring adequate oxygenation in emergency and critically ill patients is a fundamental responsibility for airway healthcare practitioners [1]. However, endotracheal intubation can be particularly challenging for medical students and junior residents due to its complexity, the absence of a universally standardized algorithm, and limited clinical exposure [2, 3]. To address these challenges, medical education in Taiwan has increasingly incorporated simulation-based training, which has been shown to improve proficiency in invasive procedures, including endotracheal intubation [4].
Despite the benefits of simulation-based training, the effectiveness of airway management simulators may be influenced by anatomical differences between populations. Most existing studies on airway management devices and training models have been conducted in predominantly Caucasian populations, raising concerns about their applicability to Asian patients due to anatomical variations [5].Previous research on obstructive sleep apnea (OSA) has highlighted significant differences in upper airway anatomy between Asian and Caucasian populations, emphasizing the need for population-specific considerations in airway management [6].
Several predictive models have been developed to assess the difficulty of endotracheal intubation in clinical settings, with some models specifically designed for Asian populations [7, 8]. However, these predictive models have not been systematically utilized to evaluate the suitability of different simulation models for airway management training.
In our previous study, we developed a predictive model incorporating four key variables—body mass index (BMI), thyromental distance, upper airway obstruction, and Mallampati classification—to assess intubation difficulty [9]. This model demonstrated strong predictive performance, with a sensitivity of 79.5% and a specificity of 81.7%.
The aim of this study is to examine the applicability of our predictive model in assessing the suitability of different airway management simulators. Specifically, we seek to determine whether the intubation difficulty predicted by our model aligns with actual intubation experiences in a simulated training setting. By addressing this gap in the literature, our study provides insights into the effectiveness of existing airway training models and their relevance to diverse patient populations.
Methodology and methods
Evaluation of the normal and difficult training models for intubation
To substantiate the authenticity of our prognosticative model and intricate formula devised for anticipating intricate intubation scenarios, we deliberately selected a quartet of simulated models tailor-made for intubation exercises. A total of five different intricate intubation training models were selected for training purposes, including Model A for standard adults, Model B, C, and D for advanced adults, as well as Model D represents a small adult airway, specifically designed to facilitate training in more challenging airway management scenarios. This model was selected for our study because it provides a higher level of difficulty, making it ideal for testing trainees’ proficiency in difficult airway situations. Detailed information on brands, models, and specifications, including Mallampati score, inter-incisor distance, thyromental distance, neck circumference, and estimated BMI, is listed in Table 1. These models have garnered widespread employment within the realms of medical pedagogy and emergency personnel training. Subsequently, we undertook the meticulous measurement of several anatomical dimensions, encompassing inter-incisor span, thyromental distance, and neck circumference (Fig. 1). Furthermore, we documented the Mallampati score and the Cormack & Lehane classification. To estimate BMI, we employed a calculation entailing neck circumference multiplied by a factor of 0.7. This approach was rooted in the precedent set by previous investigations, wherein the Beta coefficient establishing the relationship between neck circumference and BMI resided within the range of 0.6 to 0.8. As a culmination of our methodology, we proceeded to compute the intricate score about arduous endotracheal intubation. This intricate score was derived utilizing the formula outlined in the ensuing discourse:
Table 1. The characteristics of the four airway training models
Fig. 1. The definitions of inter-incisor distance, thyromental distance, and neck circumference. The distance between the upper and lower incisors with the mouth opened the widest was defined as the inter-incisor distance, and the distance between the prominent point of the mentum to the thyroid notch was the thyromental distance, the neck circumference was measured below the laryngeal prominence and at the level of the mid-cervical spine
Score for difficult endotracheal intubation = (neck circumference * 0.7 * 0.239) + (thyromental distance * -0.488) + (upper airway obstruction * 1.396) + (Mallampati grade 3/4 * 1.642).
Study subjects and process of endotracheal intubation test
This was a cross-sectional study designed to evaluate different simulators for airway management training among emergency medicine residents and post-graduate year (PGY) doctors. Emergency medicine is a board-certified specialty under Taiwan’s Ministry of Health and Welfare. The study was conducted in the emergency department of a tertiary care hospital. Data collection took place during a simulation-based training session on April 8, 2021.
Eligibility criteria included being currently enrolled in a post-graduate or emergency residency program. The study involved the use of four distinct simulation stations, each equipped with a unique airway management model of varying complexity. Esteemed visiting staff members from the emergency department acted as examiners. Prior to the examination, participants received a detailed briefing on the standard intubation procedure. Each participant performed intubation on each model twice, with a 90-second time limit for each attempt. The following parameters were meticulously recorded by the examiners:
- Intubation time during the first attempt.
- Success rate of the first intubation attempt.
- Incidence of tooth injury during the initial endeavor.
- Occurrences of gastric insufflation (stomach bagging) during the initial attempt.
- Instances of uninflated cuff mishaps during the first attempt.
- Perceived difficulty level before intubation (from 1 to 5).
- Quality of laryngoscopy view obtained during the first attempt (from 1 to 4).
- Perceived difficulty subsequent to the first intubation attempt.
- Intubation time during the second attempt.
- Success rate of the second intubation attempt.
- Potential tooth injury during the second endeavor.
- Occurrences of gastric insufflation during the second attempt.
- Instances of uninflated cuff mishaps during the second attempt.
- Quality of laryngoscopy view obtained during the second attempt (from 1 to 4).
- Perceived difficulty after the second intubation attempt (from 1 to 5).
Data for each variable of interest were collected directly by the examiners during the simulation sessions. These data included objective measures such as intubation time and success rates, as well as subjective assessments like perceived difficulty and laryngoscopy view quality.
Statistical analysis
Statistical Package for the Social Sciences (SPSS) version 24.0 (IBM, Armonk, NY, USA) and RStudio version 2023.12.1 + 402 with packages of “DescTools” and “FSA” were used for data management and statistical analysis [10–12]. Categorical variables were represented using frequency distributions and reported as n (%), and continuous variables were reported as means ± standard deviation. Kruskal-Wallis tests and Wilcoxon signed-rank tests were used to compare differences between two, or more than two independent/dependent groups, then Bonferroni tests were used as post hoc tests for the comparisons among more than two groups. Statistical significance was set by a P value less than 0.05.
Results
Among the array of endotracheal intubation training models selected, encompassing the Laerdal^®^ (Stavanger, Norway)Airway Management Trainer (designated as model A), the Life/form^®^ (Nasco Healthcare, Fort Atkinson, WI, USA)Advanced Airway Larry Trainer Head with Stand (designated as model B), the Gaumard^®^(Miami, FL, USA) Adult Advanced Multipurpose Airway & Cardiopulmonary Resuscitation (CPR)Trainer (designated as model C), and the Life/form^®^ Adult Airway Management Trainer with Stand (designated as model D), a comprehensive collection of anatomical dimensions and physical observations were painstakingly documented, constituting Table 1.
All selected models exhibited a body mass index (BMI) of ≥ 25, indicative of overweight or obesity. Additionally, none of the models had upper airway obstruction. Model C had the highest calculated difficulty score (4.430), exceeding the threshold for difficult intubation (cut-off = 4), followed by Model D (3.841), Model B (2.781), and Model A (2.558).
A total of 19 physicians participated in the study, including two post-graduate year (PGY) doctors and 17 emergency medicine residents. Among the residents, four were in their first year (R1), five in their second year (R2), four in their third year (R3), and four in their fourth year (R4).
Table 2 compares fifteen intubation-related parameters across the four training models. The Kruskal-Wallis test was used to identify significant differences. The following parameters showed no statistically significant differences: uninflated cuff in the first attempt (P =.567), success rate of the second attempt (P =.052), stomach bagging in the second attempt (P =.143), and uninflated cuff in the second attempt (P =.288). Model C had a significantly longer intubation time in both attempts and a lower success rate on the first attempt. The laryngoscopic view grade and perceived difficulty were higher for Model C compared to other models. Additionally, Model C had a significantly higher stomach bagging rate in the first attempt compared to Model D (P =.032) and a significantly higher pre-intubation difficulty level compared to Models A and B (both P <.001). Conversely, Model A had the highest incidence of tooth injury in both the first and second attempts.
Table 2. Comparisons of intubation-related indicators among the four training modelsTraining modelABCDP valueIntubation time of the first attempt28.00 ± 9.1026.37 ± 9.8358.84 ± 22.63*33.89 ± 13.06< 0.001Success rate of the first attempt94.74%94.74%57.89%100.00%< 0.001Tooth injury on the first attempt36.84%^^5.26%0.00%0.00%< 0.001Stomach bagging on the first attempt5.26%5.26%26.32%^#^0.00%0.027Uninflated cuff in the first attempt5.26%5.26%0.00%0.00%0.567Difficulty before intubation2.68 ± 0.582.68 ± 0.754.05 ± 0.78^^P <.05 to model A and B
Table 3 compares fifteen intubation-related parameters across the five training levels. PGY doctors had a significantly higher stomach bagging rate during the first attempt compared to R2 and R3 (P =.021 and 0.029, respectively). The R4 group had a significantly higher rate of uninflated cuff incidents in the second attempt compared to R2 (P =.043).Seven intubation-associated parameters were analyzed across both intubation attempts using Wilcoxon-signed rank tests (Table 4). A statistically significant reduction was observed in both intubation time and perceived difficulty in the second attempt (P <.001 and P <.001, respectively).
Table 3. Comparisons of intubation-related indicators among different training levels of the operatorsPGYR1R2R3R4P valueIntubation time of first attempt46.88 ± 20.3531.31 ± 13.1934.90 ± 17.9932.44 ± 19.3143.88 ± 24.120.095Success rate of first attempt62.50%87.50%95.00%93.75%81.25%0.175Tooth injury in first attempt25.00%6.25%5.00%18.75%6.25%0.387Stomach bagging on first attempt37.50%^&^12.50%0.00%0.00%12.50%0.021Uninflated cuff in first attempt0.00%0.00%5.00%0.00%6.25%0.680Difficulty before intubation3.25 ± 0.713.25 ± 0.863.50 ± 1.052.81 ± 1.053.06 ± 0.680.315laryngoscopy view in first attempt2.25 ± 1.042.19 ± 0.912.15 ± 0.882.13 ± 0.892.06 ± 1.000.981Difficulty after first attempt3.38 ± 1.603.38 ± 0.962.90 ± 1.452.50 ± 1.323.31 ± 1.540.306Intubation time of second attempt46.38 ± 21.9127.81 ± 12.7729.35 ± 19.3432.06 ± 22.0531.75 ± 22.400.114Success rate of second attempt75.00%81.25%95.00%100.00%75.00%0.142Tooth injury in second attempt25.00%6.25%5.00%6.25%12.50%0.508Stomach bagging on second attempt25.00%12.50%0.00%0.00%18.75%0.096Uninflated cuff in second attempt0.00%0.00%0.00%0.00%18.75%^+^0.021laryngoscopy view in second attempt1.50 ± 0.762.13 ± 0.892.20 ± 1.061.94 ± 0.682.06 ± 1.120.483Difficulty after second attempt3.00 ± 1.413.13 ± 1.262.65 ± 1.462.44 ± 1.462.44 ± 1.370.500^&^P <.05 between PGY and R2, PGY and R3. ^+^P <.05 between R2 and R4
Table 4. Comparisons of intubation-related indicators between first and second attemptsIntubation attemptFirst attemptSecond attemptP valueIntubation time36.78 ± 19.4831.89 ± 19.94< 0.001Success rate87.0%87.0%1.000Tooth injury11.0%9.0%0.790Stomach bagging9.2%9.2%1.000Uninflated cuff3.0%4.0%0.773laryngoscopy view2.14 ± 0.912.03 ± 0.940.143Difficulty after attempt3.05 ± 1.382.70 ± 1.39< 0.001**P* <.05
Figures 2 and 3 illustrate variations in intubation-associated indicators between the first and second attempts. Model C had the longest intubation time in both attempts, but a general trend of decreased intubation time in the second attempt was observed across all models. A similar trend was noted for perceived difficulty.
Fig. 2. Comparisons of intubation-related indicators among 4 training models in the first and second attempt. (A) Intubation time. (B) Success rate. (C) Tooth injury. (D) Stomach bagging. * P <.05, ** P <.001
Fig. 3. Comparisons of intubation-related indicators among 4 training models in the first and second attempt. (A) Uninflated cuff. (B) Laryngoscopy view. (C) Difficulty after attempt. * P <.05, ** P <.001
Discussion
The study evaluated different simulators for airway management training by utilizing a predictive model of difficult endotracheal intubation. Among the four models (A, B, C, and D), Model C was the most challenging for intubation, exceeding the defined difficulty threshold. Significant differences were observed in intubation performance, particularly in terms of prolonged intubation duration and increased difficulty associated with Model C, while Model A had a higher incidence of tooth injury. Furthermore, differences in performance were noted between post-graduate year (PGY) doctors and senior residents, emphasizing the impact of training levels on intubation success. Our difficult intubation prediction scores correlated well with the actual testing results, supporting their utility in differentiating higher-order modules. Repeated intubation attempts led to improved performance, reinforcing the importance of structured and repeated training in airway management.
A constraint in designing a novel training model was the inability to assess the manikin’s BMI accurately. Due to material disparities, traditional BMI measurements were infeasible. Instead, we estimated BMI using neck circumference, based on prior research demonstrating a strong correlation between the two (Beta coefficient range: 0.62–0.83) [13, 14]. This approach allowed for a more realistic assessment of intubation difficulty and aligns with studies linking neck circumference to conditions such as obstructive sleep apnea [15, 16] and obesity [17–21].
Clinically, our findings have important implications for airway management training. Intubation time is widely used as an indicator of procedural success and patient outcomes [22, 23]. Our results emphasize that difficulty scores, in addition to time and success rates, are crucial metrics for evaluating training models. Interestingly, Model C was classified as the most challenging despite not having the highest BMI or shortest thyromental distance. This discrepancy highlights the importance of a multifactorial approach to assessing intubation difficulty rather than relying on individual anatomical predictors.
Prior studies have shown that simulation-based endotracheal intubation training improves success rates and reduces procedure time over time [24]. Our findings align with these results, suggesting that continued practice, even without specific interventions between training sessions, leads to skill enhancement and reduced perceived difficulty. This underscores the value of structured simulation-based learning in developing both technical proficiency and clinician confidence [25].
An ideal simulator should replicate clinical scenarios accurately [26]. Tooth injury is a recognized complication of intubation, particularly involving the upper incisors, which can have medicolegal consequences [27]. In our study, Model A uniquely incorporated an auditory alarm mechanism to signal tooth contact. Future simulator development could benefit from integrating pressure-sensitive bite apparatuses across multiple models to provide a more comprehensive evaluation of this risk [28].
Another key consideration is anatomical realism. The thyromental distances of our manikin models were longer than those observed in Chinese patient populations [9, 29]. Given prior research demonstrating racial differences in thyromental distance cut-off values for predicting difficult intubation [30, 31], future manikin designs for Chinese populations should adjust these dimensions accordingly.
Upper airway obstruction was incorporated into our predictive model but was not simulated in the manikins. Current airway training models do not typically include foreign body obstruction scenarios. However, integrating such a feature—similar to Laerdal’s Choking Charlie—could enhance training for complex airway management [32]. A manikin designed to simulate airway obstruction and removal techniques would provide valuable additional training opportunities [33].
This study has several limitations. First, our predictive model may be subject to parameter overcompensation. For example, the upper airway obstruction coefficient could significantly alter a model’s difficulty score if incorporated, potentially leading to misclassification [34]. Future refinement of the formula should address this issue. Second, our sample size of 19 physicians, while small, is consistent with previous simulation-based studies and provides meaningful preliminary data [35]. Given institutional constraints on participant availability, future multi-center studies could enhance the generalizability of our findings.
Since this is a pilot study, power analysis was not performed, consistent with best practices for feasibility studies [36]. However, based on the observed variability in performance, a future prospective study with an estimated effect size of 0.2952 would require 80 participants across five training levels (16 per group) to achieve adequate statistical power (Supplementary data 1 and 2). Incrementally increasing sample sizes in subsequent studies will allow for more robust validation of our findings.
In summary, this study highlights the importance of simulation-based airway management training and underscores the need for refined manikin designs that better replicate patient anatomy and procedural challenges, which aligns with prior research on airway management training. Patel et al. (2024) demonstrated that independent mastery learning with automated feedback significantly enhances intubation success, reinforcing the importance of structured practice [37]. Yau et al. (2021) highlighted real-time feedback as a key factor in improving proficiency across experience levels [38]. Additionally, Roach et al. (2024) found that different airway simulators improved intubation skills, underscoring the value of simulation-based training [39]. Therefore, our findings provide a foundation for further research into optimizing airway training models to improve intubation success rates and clinical preparedness.
Conclusions
Our findings confirm that Model C presents the greatest challenge for endotracheal intubation training, as evidenced by its significantly higher difficulty score (4.430), prolonged intubation duration, and increased perceived difficulty. This suggests that Model C may be particularly beneficial for advanced airway management training, equipping trainees with the skills necessary to manage difficult intubations in clinical practice. Additionally, when analyzing physicians across different levels of training in emergency medicine, we found no significant differences in intubation time or success rate, indicating that experience level alone may not be the primary determinant of intubation proficiency. This underscores the importance of structured training protocols and deliberate practice with high-difficulty models, such as Model C, in developing critical airway management skills, rather than relying solely on accumulated years of training.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Material 1
Supplementary Material 2
Supplementary Material 3
Supplementary Material 4
Supplementary Material 5
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Gonzalez H, Minville V, Delanoue K, Mazerolles M, Concina D, Fourcade OJA. Analgesia: the importance of increased neck circumference to intubation difficulties in obese patients. 2008;106(4):1132–6.10.1213/ane.0b 013e 318167965918349184 · doi ↗ · pubmed ↗
- 2Davies R, Ali N, Stradling JJT. Neck circumference and other clinical features in the diagnosis of the obstructive sleep Apnoea syndrome. 1992;47(2):101–5.10.1136/thx.47.2.101PMC 4635821549815 · doi ↗ · pubmed ↗
- 3Dancey DR, Hanly PJ, Soong C, Lee B, Shepard J Jr, Hoffstein VJC. Gender differences in sleep apnea: the role of neck circumference. 2003;123(5):1544–50.10.1378/chest.123.5.154412740272 · doi ↗ · pubmed ↗
- 4Ben-Noun L, Sohar E, Laor AJO. Neck circumference as a simple screening measure for identifying overweight and obese patients. 2001;9(8):470–7.10.1038/oby.2001.6111500527 · doi ↗ · pubmed ↗
- 5Nafiu OO, Burke C, Lee J, Voepel-Lewis T, Malviya S, Tremper KKJP. Neck circumference as a screening measure for identifying children with high body mass index. 2010;126(2):e 306–10.10.1542/peds.2010-024220603254 · doi ↗ · pubmed ↗
- 6Hingorjo MR, Qureshi MA, Mehdi AJJ-JPMA. Neck circumference as a useful marker of obesity: a comparison with body mass index and waist circumference. 2012;62(1):36.22352099 · pubmed ↗
- 7Özkaya İ. Tunçkale ajcejoph: neck circumference positively related with central obesity and overweight in Turkish university students: a preliminary study. 2016;24(2):91–4.10.21101/cejph.a 455527119655 · doi ↗ · pubmed ↗
- 8Gallarato I. Dental injury in general anesthesia: a comparison between direct laryngoscopy and Mc GRATH® VLS. 2018.
