Evaluating a cost-effective, web-based AI platform for lateral cephalometric analysis: a comparative in-silico study
Omar Ali Kamel Eldeeb, Yomna Mohamed Yacout, Hassan Mohamed Abouelkheir, Yousria Salah ElDin Gaweesh, Noha M. Elkersh

TL;DR
This study compares a free web-based AI tool for dental analysis with a gold standard and a commercial AI tool, finding the free tool performs well and could be useful in low-income areas.
Contribution
The study introduces and validates a cost-effective, web-based AI platform (WebCeph) for cephalometric analysis in resource-limited settings.
Findings
WebCeph showed fewer and smaller discrepancies compared to manual tracing than Carestream.
WebCeph outperformed Carestream in several key cephalometric parameters.
Statistically significant differences were found in 11 out of 18 measurements across the three methods.
Abstract
Despite the current popularity of artificial intelligence (AI)- assisted cephalometric analysis, its cost precludes its widespread use in low-income countries. Free alternatives are available but mostly lack rigorous validation. Thus, this study aimed to assess the precision of AI-assisted tracing using a freely available web-based platform (WebCeph) (Artificial Intelligence Orthodontic & Orthognathic Cloud Platform, South Korea) by comparing it with the gold standard (manual tracing on OnyxCeph software) (Image Instruments, Chemnitz, Germany). It was also compared with commercially available AI-assisted tracing software (Carestream) (Carestream Dental LLC, Atlanta, GA, USA). Lateral cephalometric radiographs of 57 randomly selected patients were included in this study. The radiographs were obtained from the archives of the Radiology unit, Faculty of Dentistry, Alexandria University.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —Alexandria University
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDental Radiography and Imaging · Dental Research and COVID-19 · Forensic Anthropology and Bioarchaeology Studies
Introduction
Since its introduction, lateral cephalometric radiography has been used primarily to assess dental and skeletal changes. Currently, it plays a significant role in orthodontic diagnosis and treatment planning [1]. As it enables the evaluation of the skeletal relationships between the maxilla, mandible, and cranial base in both vertical and sagittal planes, along with the dental relationships between the upper and lower teeth and the skeletal structures [2]. Conventional cephalometric analysis involves tracing of anatomical landmarks on an acetate paper followed by measurement of the cephalometric parameters. This method is time-consuming and heavily relies on the experience and knowledge of the clinician [3].
Recent software developments and advancements in machine learning have led to a digital revolution and to the introduction of the Artificial intelligence (AI) era. AI is the machine’s capability to imitate human behavior, including complex activities and making decisions based on preset standards [4]. Its use is rapidly evolving in many fields. In the medical field it is mainly used in gastroenterology, cardiology, and radiology [5–7]. In dentistry, AI is used for several functions, such as labeling teeth, spotting cavities, identifying periapical conditions, assessing the relationship between the mandibular canal and third molars, analyzing osteoporotic alterations, planning dental implants, investigating jaw tumors, and carrying out cephalometric analysis [8, 9].
Convolutional neural networks (CNNs) are specialized versions of machine learning methods that are the most commonly employed algorithms for image analysis in AI [10, 11]. One of the forms of image analysis using AI is AI-aided cephalometric analysis. The AI-aided cephalometric analysis has gained attention because it minimizes manual tracing-related problems such as subjectivity, limited reproducibility, and being an operator-dependent time-consuming process [12, 13]. Automated landmark localization has been the subject of numerous AI investigations [14–20]. AI landmark identification showed supremacy over traditional cephalometric landmark identification in terms of efficiency and repeatability [17–21].
Recently, AI has become the most studied digital technology in Orthodontics, as its integration into orthodontics is leading to a paradigm shift in the field. As AI continues to advance, it is expected to have a bigger impact on the future of orthodontics. Hence, it is imperative for orthodontists and dental students to cope with the latest advancements and to develop a solid understanding of the digital technologies [22]. Although AI-assisted lateral cephalometric analysis saves a lot of time prior to treatment, there are limitations to implementing this technology in every dental office, especially in low-income countries and the developing world. These limitations are closely related to the economic conflicts that often stem from limited resources. Such limited resources can lead to the inability to integrate advanced digital and AI systems in the daily dental practice [23]. To address these challenges, it is essential to adopt cost-effective solutions that make the most of the available resources. For that reason, it becomes substantial to evaluate the precision of these cost-effective freely available solutions and to compare their performance with the commercially available ones.
Multiple AI-based software for automatic cephalometric analysis has emerged recently, some of which are commercial and desktop-based while others are freely available and cloud-based [24, 25]. Despite the plethora of such software, there is a critical lack of direct comparisons in the literature between the free and the commercial software. To the best of our knowledge no studies have compared a freely available web-based platform for AI cephalometric tracing with both commercially available desktop-based software and gold standard manual cephalometric tracing. Therefore, the aim of the current study was to assess the performance of AI-assisted cephalometric analysis using a freely available web-based platform (WebCeph) by comparing it with the manual tracing (Gold standard) using OnyxCeph software (Image Instruments, Chemnitz, Germany). Moreover, it was compared with the commercially available desktop-based software (Carestream). The study’s null hypothesis was that there would be no significant difference in the agreement between the three methods of cephalometric tracing.
Materials and methods
Study design
The present study was a comparative in silico study and was conducted on 57 lateral cephalometric radiographs collected from Oral and Maxillofacial Radiology department, Faculty of Dentistry, Alexandria University. The study was approved by the Scientific Research Ethics Committee, Faculty of Dentistry, Alexandria University (IORG: 0008839), Ethics committee (No: 0933-06/2024). Cephalometric tracing was performed on the radiographs using 3 methods, one was considered as the control group and two were set as test groups. Digital manual tracing by OnyxCeph software was considered as the control group [26], freely available web-based AI cephalometric tracing platforms by WebCeph was set as test group 1 and commercial desktop-based AI software by Carestream was set as test group 2. The measurements of each method were then compared.
Radiographs of individuals with dental and skeletal Class I, II, or III malocclusion were included in the study, while radiographs of patients with any dentofacial disorder or congenital abnormality or individuals with mixed dentition were excluded from the study. Cephalometric radiographs with low quality and motion artifacts were also excluded, there was no restriction regarding patients’ ethnicity or gender when selecting the radiographs.
Sample size calculation
Sample size was estimated based on assuming confidence level = 95% and study power = 80%. According to Mahto et al. [27], the Intra Class Correlation Coefficient (ICC) value of cephalometric measurements for SNA between manual tracing and AI software was 0.879, with an ICC value of 0.75 was considered as the minimum value for good agreement [28]. The minimum sample size was calculated to be 51 digital lateral cephalometric images, this was increased to 57 digital images to make up for any processing errors. Sample size was based on Rosner’s method [29] calculated by Wnarifin Online Calculator for Intraclass Correlation Coefficient (ICC) - Hypothesis Testing [30].
Data collection
Fifty seven pretreatment lateral cephalograms were obtained from the database of Oral and Maxillofacial Radiology department, Faculty of Dentistry, Alexandria University. The patients’ consent was obtained before the radiographic exposure and the patients granted permission for the use of the de-identified radiographs for research purposes. All the radiographs were taken with the patient standing upright while keeping the teeth in centric occlusion, the Frankfort plane parallel to the floor, and the lips in resting position. All the radiographs were acquired by Carestream CS 8100 X-ray machine (Carestream Dental LLC, Atlanta, GA, USA).
Cephalometric analysis
Digital manual cephalometric analysis method (control group)
For this technique, all the JPG images of the radiographs were transferred to the OnyxCeph^3^™ (Image Instruments, Chemnitz, Germany) dental analysis software for cephalometric analysis. Digital measurements were evaluated using a Dell P2217Hb monitor. Consistency in landmark identification was ensured by clearly defining the selected anatomical landmarks before the study commencement [31]. The identified landmarks [Supplementary file 1] were marked using an indicator on the mouse control with any bilateral structure being averaged to make only one landmark. A ruler on the cephalostat was calibrated in the program before marking the anatomical points. Measurements were calculated automatically by the program after marking all the landmarks (Fig. 1).
Fig. 1. Steiner (A) and Tweed (B) analyses performed using OnyxCeph software showing landmark placement and measurement outputs
All tracings were done by only one radiologist. No more than 6 radiographs were traced per day to prevent examiner fatigue. A total of 18 cephalometric measurements with 5 linear, 12 angular measurements and 1 ratio were evaluated. In these measurements, 11 parameters were skeletal and 7 parameters were dental. (Table 1) [31]. The abbreviations used by each software to denote the measured cephalometric readings are presented in Supplementary file 2.
Table 1. Measurements used in this studyMeasurementDefinitionSteiner analysis SNA°The angle between the SN plane and the NA line SNB°The angle between the SN plane and the NB line ANB°The angle between the NA and NB lines II°Interincisal angle measured as the angle between the most prominent maxillary and mandibular incisors SN-OcP°The angle between SN plane and the occlusal plane SN-GoGn°Mandibular plane angle measured as the angle between the mandibular plane (Gonion-Gnathion) and SN plane Max1-NA°The angle formed by the intersection of the NA line with the maxillary incisor long axis Mand1-NB°The angle formed by the intersection of the NB line with the mandibular incisor long axis 1u-NA (mm)The linear distance from the incisal edge of the maxillary central incisor to NA line 1 L-NB (mm)The linear distance from the incisal edge of the mandibular central incisor to NB line Pog-NB (mm)The linear distance from the pognion point to the NB lineTweed analysis FMIA°The angle formed between the Frankfort horizontal plane and long axis of mandibular incisor FMA°Mandibular plane angle measured as the angle between the Frankfort horizontal plane and the mandibular plane IMPA°The angle formed between the mandibular plane and the long axis of the mandibular incisor POr-OcP°Cant of the occlusal plane measured as the angle between the Frankfort horizontal plane and the occlusal plane PFH (mm)Posterior facial height measured as the linear distance from the Articulare point to the Gonion point AFH (mm)Anterior facial height measured as the linear distance between Menton and a line connecting ANS and PNS AFH/PFHFacial height index calculated as the ratio between the PFH to the AFH
Web-based AI cephalometric analysis method (test group 1)
For this technique the same radiographs were uploaded to the WebCeph™ (Artificial Intelligence Orthodontic & Orthognathic Cloud Platform, South Korea), free online cephalometric analysis service. A custom analysis was done on this platform to produce all the measurements simultaneously (Fig. 2).
For this method radiographs were only manually calibrated using a ruler on the cephalostat, then all the cephalometric landmarks were identified and digitized and measurements produced automatically by AI without any human intervention.
Fig. 2. Steiner and Tweed analyses in Webceph platform showing automatic landmark placement and measurement outputs
Desktop-based AI cephalometric analysis method (test group 2)
For this technique Carestream imaging software V8 (Carestream Dental LLC, Atlanta, GA, USA) was used. All radiographs were only manually calibrated using a ruler on the cephalostat. All the cephalometric landmarks were identified and digitized and measurements produced automatically by AI without any human intervention (Fig. 3).
Fig. 3. Steiner (A) and Tweed (B) analyses performed using Carestream software showing landmark placement and measurement outputs
For the two AI groups, the cephalometric analysis process was fully automated as the software placed all landmarks without any manual correction or intervention from the operator.
Reliability assessment
Before the study commencement, the researcher who performed the manual cephalometric tracing (OE) was calibrated by repeatedly performing the cephalometric analysis to ensure correct placement of the landmarks and the landmark positions were validated by an experienced orthodontist (YY).
During the study, 12 images were randomly chosen and retraced by the same researcher and by a different researcher, with a 2-week interval between assessments to assess intra- and inter-examiner reliability, respectively.
Blinding
The researcher manually placed the cephalometric landmarks only in OnyxCeph software, therefore, blinding of the researcher during landmark identification was not possible. Moreover, because cephalometric measurements were automatically generated using the three software, there was no chance of human bias influencing the results, thus negating the need for blinding during this step. Nevertheless, while operator blinding was not possible, the order of analysis in the different software was randomized to prevent systematic bias. In addition, the statistician was blinded during data assessment.
Statistical analysis
Normality of all parameters were checked using Shapiro Wilk test and Q-Q plots. Normal distribution was approved for all parameters except ANB, 1u-NA, 1 L-NB and Pog-NB. All data were presented using mean, standard deviation (SD) in addition to median and inter quartile range (IQR). Repeated measures ANOVA and Friedman test followed by pairwise comparisons with Bonferroni correction to adjust for type I error was used for comparison between groups. Mean differences and 95% Confidence intervals were calculated. Intra and inter examiner reliability were examined using Intra Class Correlation Coefficient (ICC). In addition, ICC was used to determine the agreement between the measurements obtained using the three methods of cephalometric analysis. All tests were two tailed and the significance level was set at p value < 0.05. Data was analyzed using IBM SPSS, for Windows, version 23, Armonk, NY, USA.
Results
The ICC values showed intra-examiner reliability ranging between (0.768–0.993) for all measurements indicative of good to excellent intra-observer reliability [28]. The ICC of the mean measurement values between the two researchers ranged between (0.895–0.996) also indicative of good to excellent inter-observer reliability with the exception of 1u-NA which was (0.741) indicating moderate inter-observer reliability for this measurement (Table 2) [28].
Table 2. Inter-examiner reliability and intra-examiner reliability of all parametersIntra-examinerInter-examinerICC95% CIP valueICC95% CIP valueSNA0.9790.928, 0.994< 0.0010.9440.891, 0.984< 0.001SNB0.9870.954, 0.996< 0.0010.9290.774, 0.979< 0.001ANB0.9820.940, 0.995< 0.0010.9720.905, 0.992< 0.001II0.9880.961, 0.997< 0.0010.9890.961, 0.997< 0.001SN-OcP0.9610.871, 0.989< 0.0010.9750.916, 0.993< 0.001SN-GoGn0.9830.944, 0.995< 0.0010.9630.878,0.989< 0.001Max1-NA0.9730.910, 0.992< 0.0010.9760.918, 0.993< 0.001Mand1-NB0.9910.970, 0.998< 0.0010.9900.965, 0.997< 0.0011u-NA0.7680.375, 0.9270.0010.7410.319, 0.918< 0.0011 L-NB0.9190.743, 0.976< 0.0010.9620.874, 0.989< 0.001Pog-NB0.8570.578, 0.957< 0.0010.8950.678, 0.969< 0.001FMIA0.9930.977, 0.998< 0.0010.9910.968, 0.997< 0.001FMA0.9440.819, 0.984< 0.0010.9440.817, 0.983< 0.001IMPA0.9920.973, 0.998< 0.0010.9900.966, 0.997< 0.001POr-OcP0.9080.712, 0.972< 0.0010.9260.764, 0.978< 0.001PFH0.9600.868, 0.988< 0.0010.9620.873, 0.989< 0.001AFH0.9880.957, 0.996< 0.0010.9960.986, 0.999< 0.001AFH/PFH0.9780.927, 0.994< 0.0010.9750.915, 0.993< 0.001**ICC Intra Class Coefficient, CI Confidence Interval*Statistically significant difference at p value < 0.05
Comparison of the mean cephalometric measurements revealed significant differences between the three methods of cephalometric tracing. The results for Steiner and Tweed analysis are shown in Tables 3 and 4, respectively. Pairwise comparisons of Steiner measurements among the three software programs are shown in Table 5. The results showed that the magnitude of the mean differences between both AI software versus manual tracing was generally below the clinical threshold of 2 mm/degrees, except for Max1-NA and Mand1-NB. Pairwise comparisons of Tweed measurements among the three software programs are shown in Table 6. The results demonstrated significant differences in 5 parameters between Carestream software and the manual gold standard that considerably exceeded the 2 mm/degrees clinical threshold (FMA, IMPA, Por-OcP, PFH, AFH/PFH), whereas the differences between WebCeph and the manual gold standard were not significant. The ICC values of the cephalometric measurements obtained using the three methods are reported in supplementary file 3. For the values of WebCeph versus manual tracing, all the measurements showed ICC values above 0.8, indicating good to excellent agreement [28], except 1u-NA which showed an ICC value of 0.529. On the other hand, the ICC values of Carestream versus manual tracing ranged between 0.386 and 0.833, indicating poor to good agreement [28], with only the AFH showing an ICC value of 0.970.
Table 3. Comparison of Steiner measurements between Onyxceph, Carestream and Webceph Control group
Test groups Onyxceph(n=57)Carestream(n=57)Webceph(n=57)P valueMean ±SDMean ±SDMean ±SDSNA°82.26 ±4.9083.07±5.0282.85±3.49 0.467SNB°77.63 ± 4.3678.09 ± 7.0477.37 ± 3.520.534ANB°^ⱡ^4.64 ± 3.664.46 ± 3.345.47 ± 3.29< 0.001II°126.38 ± 13.02126.47 ± 20.35129.89 ± 11.470.171SN-OcP°19.15 ± 6.2317.09 ± 7.8419.39 ± 5.870.035SN-GoGn°34.86 ± 5.9835.12 ± 7.0233.96 ± 6.810.214Max1-NA°20.34 ± 8.6821.36 ± 7.8417.95 ± 7.750.002Mand1-NB°28.69 ± 8.0425.62 ± 8.2126.68 ± 7.270.0061u-NAmm^ⱡ^4.33 ± 2.933.51 ± 2.222.68 ± 2.07< 0.0011 L-NBmm^ⱡ^5.88 ± 2.545.30 ± 2.446.09 ± 2.830.063Pog-NBmm^ⱡ^1.04 ± 1.451.22 ± 0.861.05 ± 1.550.520ⱡFriedman Test, all other p values are obtained from Repeated Measures ANOVAStatistically significant difference at p value < 0.05
Table 4. Comparison of Tweed measurements between Onyxceph, Carestream and WebcephControl groupTest groupsOnyxceph(n = 57)Carestream(n = 57)Webceph(n = 57)P value^ⱡ^Mean ± SDMean ± SDMean ± SDFMIA°57.09 ± 8.9457.05 ± 8.6459.45 ± 8.210.049FMA°28.30 ± 6.6833.29 ± 5.4326.52 ± 6.27< 0.001IMPA°94.63 ± 9.4889.61 ± 8.6694.04 ± 9.38< 0.001POr-OcP°10.96 ± 5.2013.13 ± 5.7110.65 ± 4.300.001PFHmm41.75 ± 5.7138.35 ± 2.5640.89 ± 5.12< 0.001AFHmm61.32 ± 6.9261.83 ± 5.7861.47 ± 6.880.120AFH/PFH68.60 ± 9.1662.32 ± 4.6266.83 ± 7.44< 0.001ⱡRepeated Measures ANOVA*Statistically significant difference at p value < 0.05
Table 5. Pairwise comparisons and mean differences of Steiner measurements between Onyxceph, Carestream and WebcephParametersGroupsCompared toMean diff95% CIp valueSNA°OnyxcephCarestream−0.81−2.84, 1.22-Webceph−0.58−1.71, 0.54-CarestreamWebceph0.23−1.47, 1.92-SNB°OnyxcephCarestream−0.47−2.83, 1.90-Webceph0.25−0.49, 0.99-CarestreamWebceph0.72−1.46, 2.89-ANB°OnyxcephCarestream0.18−0.79, 1.141.00Webceph−0.84−1.54, −0.140.001CarestreamWebceph−1.01−1.72, −0.30< 0.001II°OnyxcephCarestream−0.09−6.31, 6.12-Webceph−3.51−5.67, −1.35-CarestreamWebceph−3.42−9.20, 2.36-SN-OcP°OnyxcephCarestream2.05−0.60, 4.710.183Webceph−0.25−1.21, 0.721.00CarestreamWebceph−2.30−4.68, 0.080.061SN-GoGn°OnyxcephCarestream0.25−1.81, 2.32-Webceph1.160.25, 2.07-CarestreamWebceph0.91−0.94, 2.75-Max1-NA°OnyxcephCarestream−1.03−3.58, 1.530.980Webceph2.380.31, 4.460.019CarestreamWebceph3.411.05, 5.770.002Mand1-NB°OnyxcephCarestream3.070.60, 5.540.010Webceph2.010.88, 3.14< 0.001CarestreamWebceph−1.06−3.45, 1.330.8391u-NAmmOnyxcephCarestream0.83−0.12, 1.770.569Webceph1.650.72, 2.59< 0.001CarestreamWebceph0.830.19, 1.470.0301 L-NBmmOnyxcephCarestream0.57−0.04, 1.19-Webceph−0.22−0.62, 0.19-CarestreamWebceph−0.79−1.45, −0.13-Pog-NBmmOnyxcephCarestream−0.19−0.50, 0.13-Webceph−0.02−0.29, 0.25-CarestreamWebceph0.17−0.16, 0.50-CI Confidence Interval*Statistically significant difference at p value < 0.05
Table 6. Pairwise comparisons and mean differences of tweed measurements between Onyxceph, Carestream and WebcephParametersGroupsCompared toMean diff95% CIp valueFMIA°OnyxcephCarestream0.04−3.06, 3.131.00Webceph−2.36−3.64, −1.08< 0.001CarestreamWebceph−2.39−5.12, 0.330.104FMA°OnyxcephCarestream−4.99−6.72, −3.27< 0.001Webceph1.780.58, 2.980.002CarestreamWebceph6.775.23, 8.31< 0.001IMPA°OnyxcephCarestream5.032.27, 7.79< 0.001Webceph0.60−0.78, 1.970.865CarestreamWebceph−4.43−7.29, −1.580.001POr-OcP°OnyxcephCarestream−2.17−4.14, −0.190.027Webceph0.32−0.88, 1.521.00CarestreamWebceph2.481.00, 3.96< 0.001PFHmmOnyxcephCarestream3.401.99, 4.81< 0.001Webceph0.87−0.21, 1.950.155CarestreamWebceph−2.55−3.83, −1.24< 0.001AFHmmOnyxcephCarestream−0.51−1.22, 0.20-Webceph−0.15−0.60, 0.30-CarestreamWebceph0.36−0.28, 1.00-AFH/PFHOnyxcephCarestream6.283.84, 8.71< 0.001Webceph1.77−0.14, 3.670.078CarestreamWebceph−4.51−6.52, −2.50< 0.0013*Statistically significant difference at p value < 0.05
Discussion
Cephalometric analysis is an integral part of orthodontic treatment planning. Conventionally, manual cephalometric analysis is performed using acetate paper, ruler and protractor. However, it has limited reproducibility upon repetition and consumes too much time [3]. Currently, the use of AI for automatic tracing of cephalometric radiographs is showing great promise to overcome the limitations of conventional manual tracing [27]. Although a previous study showed that AI-automated cephalometric analysis was less accurate and slower compared to manual tracing performed by skilled clinicians, the ongoing technological advances continue to improve the performance of AI [32]. Recent research has demonstrated that automatic cephalometric analysis is superior in terms of repeatability, and it saves time as manual tracing takes almost 80 times as long as the AI software takes to trace a cephalogram [1]. Consequently, a number of fully automated AI-driven cephalometric analysis software and platforms have been developed [33, 34].
However, economic issues, especially in low-income countries and governmental institutions, hinder the utilization of commercially available AI-assisted software and their periodic license renewals present an additional burden that cannot be afforded most of the time. Little data is available in the literature about the precision of freely available platforms for automatic cephalometric analysis and their applicability as alternatives to the commercial software. This study aimed to assess the performance of AI-assisted cephalometric analysis using a freely available web-based platform in comparison to manual tracing, and to compare its performance with another commercially available desktop-based software. Significant differences were found between the three methods of tracing, hence, the null hypothesis was rejected. The free platform, WebCeph, performed comparably to manual tracing, and outperformed the commercial software, Carestream, in several key parameters.
The choice to compare measurements, not landmark identification, was made because they are the end products of the cephalometric analysis process, which provides data to aid in treatment planning [35]. OnyxCeph has been shown in a previous study [26] to be an efficient software to perform cephalometric analysis, therefore it was used as the gold standard for manual cephalometric tracing in the current study. The aforementioned study showed that minimal statistical differences were detected between the measurements generated by conventional manual tracing and those generated by digital tracing using OnyxCeph program.
The comparison between the three software in the current study was limited to Steiner and Tweed analysis. The choice of the cephalometric parameters included in the study was based on their availability in the three software. In addition, to ensure consistency between the three methods, the description of the measured cephalometric parameters in each software was checked for conformity prior to the study commencement. When interpreting the results, a difference of < 2 mm/degrees was considered clinically acceptable based on previous published research [26, 36–38].
In the current study, good to excellent intra-examiner and inter-examiner reliability was reported for all the measured parameters, except 1u-NA, which showed moderate inter-examiner reliability. Such disparity possibly pertains to the difficulty in the precise localization of the incisal edge landmark in cases where the maxillary central incisor is rotated.
The agreement of the measured parameters between the free web-based automatic platform (WebCeph) and the manual gold standard was generally good to excellent. Similarly, Mahto et al. [38] reported good to excellent agreement between WebCeph platform and conventional manual tracing. In the current study, WebCeph differed significantly from both OnyxCeph (the gold standard) and Carestream only regarding ANB, Max1-NA, Mand1-NB, 1u-NA and FMIA. The differences regarding Max1-NA and FMIA may be attributed to the difficulty of tracing the exact outline of the incisors due to the overlapping of the images of the anterior teeth in the cephalometric radiograph. Another reason may be the difficulty faced by AI to accurately locate the Or and Po points of the Frankfort horizontal plane since these points lie on indistinct areas that are not as well-defined as a suture or a sharp bony angle. The current study results are similar to the results of previous research [39, 40] which applied the same inclusion and exclusion criteria and included Onyxceph software as the gold standard. On the other hand, the results of Mercier et al. [36] and El-Dawlatly et al. [39] studies found Webceph to be less accurate than the manual gold standard method. A possible reason for the improved accuracy of WebCeph in the current study is that the AI algorithm has been trained over time thus allowing it to identify landmarks more precisely and to generate more accurate results. The clinical significance of the variations detected between the cephalometric measurements acquired using manual versus AI landmark identification is an important aspect to consider [26, 36–38]. Despite the reported statistical significance regarding ANB, Mand1-NB and 1u-NA, the mean differences of the measurements between the two tracing methods was less than 2 mm/degrees, which is considered clinically non-significant.
As for the commercial desktop-based AI software, poor to good agreement was found between Carestream and the manual gold standard for most of the measured parameters. The results showed that there were significant differences between Carestream and both OnyxCeph (the gold standard) and Webceph in SN-OcP, Mand1-NB, POr-OcP, and IMPA. These variations may be explained by the different methods each software uses to identify the occlusal plane. Moreover, as mentioned earlier, the difficulty in identification of the Frankfort horizontal plane may have contributed to these differences. The overlapping of the incisors in the anterior region makes their identification by either the manual or the automated method challenging. Moreover FMA, PFH, and PFH to AFH ratio measurements in Carestream differd significantly from both OnyxCeph and WebCeph measurements. A possible reason may be the occurrence of an error in localization of Go point, which is a bilateral structure, thus affecting the mandibular plane inclination and the PFH. When the right and left sides do not overlap, the midpoint between them is arbitrarily selected, thus possibly introducing errors in the related measurements. Go point localization may also be impacted by obstruction by surrounding anatomical structures which increases the difficulty of its detection by AI. AI systems are trained to detect patterns. However, for areas that are obscured by the surrounding structures or the overlapping structures, AI may find it harder to learn a single consistent pattern resulting in higher rates of error. Conversely, the human evaluator can rely on their personal judgment and experience to accurately determine the location of cephalometric points [27, 41]. It was also found that when using AI, if a landmark was on a well-delineated border of the cephalometric structure or a structure with high contrast, then the mean error would be less than if it was in an area that was not well delineated or in a low contrast area of the cephalometric radiograph [42]. The results of the current study are partially in agreement with previous research [43] that found significant differences between automated tracing using Carestream software and manual tracing using Dolphin 3D Imaging software regarding FMA and IMPA. However, unlike the current research, the previous study generally reported good agreement between Carestream automated tracing and manual tracing [43]. The inconsistency between the study results may have arisen from the different study settings. The radiographs in the abovementioned study were obtained from a private clinic, whereas the radiographs in the current study were collected from the archives of a University hospital where more complex cases are commonly encountered [44]. In complex malocclusions, the overlapping structures and severe skeletal deformities may preclude the accurate detection of cephalometric landmarks using AI.
In addition to being free to use, another possible advantage of the web-based system evaluated in the current study is that it eliminates the need for the time-consuming software installation procedures, and provides cloud storage of the patients’ orthodontic records, hence conserving digital storage space on personal computers and enabling access to the records from any operating system or device [45]. Additionally, being cloud-based eliminates the financial burden of buying high-performance computers with specific hardware requirements that are needed to run complex desktop software [46]. On the other hand, for orthodontists in areas with slow internet connection, a cloud-based system may present a significant drawback. In such case, a desktop-based program, once installed, can be used offline, making it more practical for operators with limited internet access. Webceph platform also provides a semi-automatic cephalometric analysis option where the operator can edit the placement of landmarks after automatic digitization by AI and it was stated that this semi-automatic method is more efficient than the fully automatic one [36, 39]. Based on the results of the current study, the accuracy of the free AI cephalometric analysis software may be considered acceptable and surpasses the commercial software. Nevertheless human supervision is still necessary after AI- assisted cephalometric analysis [27]. Additionally the accuracy of cephalometric measurements might increase if the quality of the images is improved by modifying sharpness and contrast [33].
A primary limitation of the current study is that only one software was used to represent each cephalometric tracing method, consequently limiting the generalizability of the study results. Moreover, only two cephalometric analyses (Steiner and Tweed) were evaluated in the current study. The accuracy of the tested software in performing other types of cephlometric analyses and the performance of other similar software should be assessed in future studies. Another factor that limits the generalizability of the results is the exclusion of radiographs of patients in the mixed dentition stage and patients with craniofacial anomalies. Furthermore, the performance of the AI software highly depends on the quality of the radiographic images. In the current study, images with low quality and radiographs showing motion artifacts were excluded. Future research should investigate the performance of AI software using a more diverse sample representing the real-world clinical situtaion. An additional limitation of the current study is that a single operator performed the manual tracings. Despite calibrating the researcher prior to the study commencement, this may have introduced bias. Another recommendation for future research is to incorporate 3D cephalometrics as the current study only assessed 2D cephalometric imaging.
Apart from cephalometric analysis, AI is now being implemented in face analysis and enhancement, which has potential applications in orthodontic treatment planning. For example, a recent study has demonstrated that AI was able to enhance the facial features of patients, thus guiding the orthodontists in formulating personalized treatment plans that address the patients’ esthetic concerns [47]. Additionally, some of those AI-powered apps run on smartphones, making them transportable and easily accessible, hence we recommend that these applications to be investigated in future studies.
Conclusions
Within the limitations of the current study, it was concluded that AI-assisted cephalometric analysis using the freely available web-based platform, WebCeph was more precise than AI-assisted cephalometric analysis using the commercially available desktop-based software, Carestream. The accessibility and precision of WebCeph could allow orthodontists to efficiently perform cephalometric analysis particularly in resource-limited settings where commercial software is prohibitively expensive. Human supervision over AI-assisted cephalometric tracing is advised.
Supplementary Information
Supplementary Material 1: Supplementary file 1. Landmarks that were identified in Onyxceph software
Supplementary Material 2: Supplementary file 2. Names of the measured parameters in each software
Supplementary Material 3: Supplemental Table 3. ICC values of cephalometric measurements between the three methods of cephalometric analysis
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1PRINCE S, Tsander Tito et al. Reproducibility of linear and angular cephalometric measurements obtained by an artificial-intelligence assisted software (Web Ceph) in comparison with digital software (Auto CEPH) and manual tracing method. Dent Press J Orthod. 2023: 28.01: e 2321214.10.1590/2177-6709.28.1.e 2321214.oar PMC 1006974737018830 · doi ↗ · pubmed ↗
- 2Assemble Circle Co. Ltd. Web Ceph User Manual 2020 [cited 2024 Jul 31]. Available from: https://d 3j 8buaepzkvqi.cloudfront.net/pdf/WEBCEPH_manual_v 1.pdf
- 3Kim H, Shim E, Park J, Kim Y-J, Lee U, Kim Y. Web-based fully automated cephalometric analysis by deep learning. Comput Methods Programs Biomed. 2020;194:105513.10.1016/j.cmpb.2020.10551332403052 · doi ↗ · pubmed ↗
