Development and Validation of a Score-Based Model for Estimating Esophageal Squamous Cell Carcinoma and Precancerous Lesions Risk in an Opportunistic Screening Population
Yan Bian, Ye Gao, Huishan Jiang, Qiuxin Li, Yuling Wang, Yanrong Zhang, Zhaoshen Li, Jinfang Xu, Luowei Wang

TL;DR
A new score-based model helps identify people at high risk for esophageal cancer and precancerous lesions during opportunistic screenings, reducing the need for endoscopies.
Contribution
The first score-based model for estimating ESCC and precancerous lesion risk in opportunistic screening populations.
Findings
The model detected 70.0% of high-grade lesions, 81.3% of early ESCC, and 81.1% of advanced ESCC with 76.4% specificity.
Using the model could reduce the number of individuals needing endoscopy by 75.6%.
Abstract
Currently, there are no score-based models for estimating esophageal squamous cell carcinoma (ESCC) and precancerous lesions risk in an opportunistic population. The present study developed and validated a score-based risk prediction model for opportunistic screening for ESCC for the first time, comprising 8 variables on a 21-point scale. The model could detect 70.0%, 81.3%, and 81.1% of high-grade intraepithelial neoplasia, early ESCC, and advanced ESCC, respectively, with a specificity of 76.4%. Additionally, the score-based model could result in 75.6% fewer individuals subjected to endoscopy. The utilization of the score-based model enables risk stratification and individual self-assessment of ESCC during opportunistic screening. Background: Opportunistic screening is one major screening approach for esophageal squamous cell carcinoma (ESCC). We aimed to develop a score-based risk…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —Science and Technology Commission of Shanghai Municipality
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEsophageal Cancer Research and Treatment · Pancreatic and Hepatic Oncology Research · Gastric Cancer Management and Outcomes
1. Introduction
Esophageal cancer (EC) represented the 11th most prevalent and 7th most deadly malignant tumor globally, with an estimated 511,054 new cases and 445,391 deaths in 2022, respectively [1,2]. EC was classified into two main histological types: esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC) [3,4]. The risk factors, etiology, and distribution of ESCC and EAC differ significantly. EAC has been linked with Barrett’s esophagus, a history of gastroesophageal reflux disease, obesity, and tobacco smoking [5]. In contrast, ESCC has been associated with smoking and alcohol consumption. ESCC was predominantly concentrated in eastern and central Asia, sub-Saharan Africa, and some South American countries [6]. EAC, conversely, was mainly found in most countries in Europe, North America, and Oceania. ESCC represented over 85% of all cases [7]. It was estimated that approximately 50% of all EC cases occur in China. Furthermore, EC was the 7th most common cancer and the 5th most common cause of cancer-related mortality in China [8,9].
The absence of typical symptoms associated with early ESCC frequently resulted in the disease being diagnosed at an advanced stage, with a five-year survival rate of less than 30% [10,11]. Screening is an effective method for improving the early diagnosis of ESCC, which in turn enhances patients’ prognosis and quality of life [12,13]. Cancer screening programs could be categorized as either population-based or opportunistic screening. The implementation of mass population screening is challenging in areas with large populations due to the high cost and resource constraints involved. As a result, opportunistic screening is usually considered a viable alternative strategy [14,15]. Upper gastrointestinal endoscopy with targeted biopsy represented a validated approach to ESCC screening [16]. However, the high cost rendered it unsuitable for use in screening [12].
Risk stratification models could be used to assess each individual’s risk using relevant predictors so that only those at high risk are recommended for further endoscopy. This approach would allow the development of a cost-effective opportunistic screening strategy and holds promise in improving the sensitivity of endoscopic detection by alerting the endoscopists. However, the risk prediction model based on opportunistic screening populations remained inadequate. Liu et al. developed an ESCC predicting model for opportunistic screening constructed by logistic regression with five predictors [17]. The model proposed by Liu et al. employed alarm symptoms, such as dysphagia, as predictors, which may render the model more applicable to diagnosis than screening. Furthermore, the utilization of algorithms without a score assigned may impede the generalization of the model and patient self-assessment.
In this study, we developed and externally validated a score-based prediction model for estimating the risk of ESCC and precancerous lesions using data from a hospital-based opportunistic population in China. The score-based model facilitates the identification of individuals at high risk of ESCC, thereby enabling the implementation of further endoscopy in opportunistic screening.
2. Methods
2.1. Study Population
This study was a secondary analysis of a published nationwide, multicenter screening trial conducted in ESCC high-risk areas in China (Esophageal Cancer Screening Trial, ClinicalTrials.gov, NCT04609813). The participants in this study were recruited from 39 centers, all of which were either tertiary or secondary referral centers, between 1 January 2021 and 31 May 2022. The procedure of participant recruitment was based on the Esophageal Cancer Screening Trial, initiated by Shanghai Changhai Hospital [18]. The 21 centers situated in northern China were designated as the training cohort, whereas the 18 centers located in southern China were classified as the validation cohort. The geographic location of China is distinguished by the presence of the Qinling Mountain–Huaihe River Line, which serves to delineate the country’s southern and northern regions.
The participants in this study were consecutively recruited from outpatients at all study centers undergoing upper gastrointestinal endoscopy. The inclusion criteria were as follows: (1) age 40 to 75 years, (2) no history of esophageal neoplasia or cancer, (3) no alarming symptoms, including dysphagia, hematemesis, and melena, (4) written informed consent provided, and (5) an adequate upper gastrointestinal endoscopy undergone. A total of 14,597 participants were recruited, and the model was developed and tested by the pre-established training and validation cohorts.
The study was approved by the Shanghai Changhai Hospital Ethics Committee (No. CHEC2020-088) and was reviewed by all participating institutes. Written informed consent was obtained from all participants.
2.2. Study Procedures
As part of the Esophageal Cancer Screening Trial process, all enrolled participants completed a questionnaire and finally underwent upper gastrointestinal endoscopy.
2.2.1. Questionnaire Survey
A structured online questionnaire was used in the present study. The questionnaire included baseline information (age, sex, residence, education level, body weight, and height), living styles (cigarette smoking [smoke more than one cigarette every day for more than one year; if yes, the number of cigarettes and duration were asked], alcohol drinking [alcohol drinking more than once every week for more than one year; if yes, the kind of wine, drinking frequency, and alcohol flushing were asked]), eating habits (hot food preference [hot food was defined as that had hot sensation in the mouth], pickled food preference [high frequency was defined as more than three times per week]), tooth loss, and family history of EC (first- or second-degree relatives).
2.2.2. Upper Gastrointestinal Endoscopy Examination
All participants underwent upper gastrointestinal endoscopy. A white light view of the esophagus and stomach is taken, which is consistent with the approach used in the majority of endoscopy centers. Narrow-band imaging was requested to view the full length of the esophagus or Lugol’s chromoendoscopy for the examination of suspicious lesions. All suspicious lesions were subject to biopsy. The photographs or videos of upper gastrointestinal endoscopy were reviewed by two independent expert investigators. The biopsy specimens were evaluated by two experienced gastrointestinal pathologists following standard processing procedures. High-grade intraepithelial neoplasia (HGIN) and ESCC identified by biopsy were then subjected to further evaluation for the purpose of determining the standardized treatment. The pathology reports of endoscopic or surgical resection specimens were obtained for the purpose of further confirmation.
2.3. Outcomes
The primary outcome of this study was histology-confirmed, esophageal high-grade lesions, including squamous epithelial HGIN, early ESCC, and advanced ESCC. In this study, squamous epithelial HGIN was defined as the presence of dysplastic squamous epithelial cells that occupied over half of the whole epithelium. Early ESCC is confined to the mucosa, with no deeper involvement and no locoregional or distant spread [19]. Barrett’s esophagus, glandular epithelia HGIN, and EAC were not considered as the target outcomes for this study. The outcome measures included the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, accuracy rate, positive predictive value (PPV), negative predictive value (NPV), positive and negative likelihood ratio (LR), and the number needed to screen to detect one case of high-grade lesions (NNS).
2.4. Statistical Analysis
In the training cohort, univariable logistic regression was employed to ascertain risk factors for esophageal high-grade lesions, which were then utilized as candidate predictors for the construction of the model. Predictors with p value < 0.1 and odds ratio (OR) > 1.0 were subjected to multivariable logistic regression models. The exclusion of predictors from the multivariable logistic regression model was based on p values ≥ 0.1. This ensured that the final model included predictors with p value < 0.05 and OR > 1.0. Points were assigned by dividing the regression coefficients by the absolute value of the smallest coefficient of predictors and rounding up to the nearest 0.5. A score-based prediction model was constructed by aggregating the scores of each predictor for each participant.
Receiver operating characteristic curves (ROCs) were employed to evaluate the diagnostic efficacy of the model, and the score at the maximum of the Youden index was designated as the cut-off value. The score-based model and cut-off value were applied to the validation cohort for external validation and further analysis. The diagnostic performance of the score-based model was assessed by using AUROC, sensitivity, specificity, accuracy rate, PPV, NPV, positive LR, negative LR, and NNS. The calibration of the score-based model was assessed with calibration curves. Decision curve analysis (DCA) was used to report the net clinical benefit of the score-based model.
In this study, the comparison of the characteristics of the participants in the training and validation cohorts was conducted using the chi-squared test for categorical variables. All tests were two-sided and p values < 0.05 were considered to be significant. Analyses were performed with GraphPad Prism (version 9.0.0), MedCalc (version 20.022), and R software (version 4.2.2).
3. Results
3.1. Participant Characteristics
As shown in Table 1, a total of 7899 and 6698 participants were enrolled in the training and validation cohorts, respectively. In the training cohort, 153 patients were identified through histology as having esophageal high-grade lesions, comprising 41 HGIN, 27 early ESCC, and 86 advanced ESCC cases. The validation cohort comprised 98 esophageal high-grade lesions, including 30 HGIN, 32 early ESCC, and 37 advanced ESCC. Notably, one patient in each of the training and validation cohorts exhibited multiple malignant lesions, resulting in a discrepancy between the total number of lesions and the number of patients with high-grade lesions. Additionally, only BMI was balanced between the training and validation cohorts, while all other variables were significantly different. This indicated that there were significant differences in the characteristics of the opportunistic screening populations in southern and northern China, which could facilitate adequate validation of the model performance.
3.2. Development of the Scored-Based Prediction Model
To construct the model, 12 potential risk factors were selected as candidate predictors for the univariable logistic regression model (Table S1). The univariable logistic regression analysis revealed that the majority of the risk factors exhibited p values < 0.05, except alcohol flushing, hot food preference, and family history. Nevertheless, the univariable logistic regression p values for alcohol flushing and family history were less than 0.1, thus necessitating their inclusion in the multivariate logistic regression model.
The initial multivariable logistic regression model included 11 variables (Table S2). Of these, three variables (education level, alcohol drinking, and alcohol flushing) had p values ≥ 0.1, indicating that they were not statistically significant. Subsequently, the remaining eight variables (p < 0.1) were finally included in the multivariable logistic regression (Table 2). The results of the second multivariable logistic regression showed p values < 0.05 and ORs > 1.0 for all 8 variables.
The scores for each predictor are shown in Table 2, which constituted the score-based model for esophageal high-grade lesions as follows: age (4 for 50–59 years old; 6.5 for 60–69 years old; 9.5 for >69 years old), sex (2.5 for male), residence (2.5 for rural), BMI (1.5 for ≤22 kg/m^2^), cigarette smoking (1.5 for yes and smoking ≤ 30 pack-years, and 2 for yes and smoking > 30 pack-years), pickled food preference (1.5 for high), tooth loss (1.5 for >4), family history (1.5 for yes), with the total scores of each individual ranging from 0 to 21.
The AUROC for the score-based model was 0.833 (95%CI, 0.803–0.862) in the training cohort (Figure 1). Table S3 shows the diagnostic performance of each score as a cut-off value in identifying individuals at high risk of high-grade lesions in the training cohort. As the score increases, the risk of high-grade lesions rises concomitantly with a decline in the proportion of recommendations for endoscopy and sensitivity. Conversely, specificity shows a gradual increase. A score of 9 at the maximum Youden Index (0.526) was selected as the cut-off value, with a sensitivity and specificity of 84.3% (95%CI, 77.6–89.7%) and 68.3% (95%CI, 67.3–69.4%), respectively. The sensitivity of this model for the detection of HGIN, early ESCC, and advanced ESCC was 82.9% (95%CI, 67.9–92.9%), 81.5% (95%CI, 61.9–93.7%), and 86.1% (95%CI, 76.9–92.6%), respectively (Table 3). Additionally, 32.7% of individuals were identified as high-risk individuals, and one case of high-grade lesions could be detected by performing 20 upper gastrointestinal endoscopies (Table 3).
3.3. External Validation of the Scored-Based Prediction Model
Furthermore, the scored-based model also showed excellent discriminative performance in the validation cohort. The AUROC for the score-based model was 0.828 (95%CI, 0.793–0.864) in the validation cohort (Figure 1). The overall sensitivity and specificity for the model were 77.6% (95%CI, 68.0–85.4%) and 76.4% (95%CI, 75.4–77.4%) at the cut-off score of 9, respectively (Table 3). The model exhibited a high capacity for distinguishing between various esophageal high-grade lesions, with a sensitivity of 70.0% (95%CI, 50.6–85.3%), 81.3% (95%CI, 63.6–92.8%), and 81.1% (95%CI, 64.9–92.0%) for HGIN, early ESCC, and advanced ESCC, respectively (Table 3). The model identified 24.4% of the individuals in the validation cohort as high-risk, indicating the need for further endoscopy (Table 3). Furthermore, the model demonstrated a significant reduction in the number of screening cases required to detect one case of high-grade lesions, from 68 to 21 (Table 3). The accuracy rate, PPV, NPV, positive LR, and negative LR are summarized in Table 3.
3.4. Model Calibration and Clinical Utility
The Hosmer–Lemeshow goodness-of-fit test and calibration plot analysis indicated that the score-based model demonstrated satisfactory calibration in both the training and validation cohorts (Table S4, Figure 2). Additionally, DCA demonstrated the net clinical benefit of utilizing the model in comparison to the alternative scenarios of all endoscopic screening and no endoscopic screening (Figure S1).
4. Discussion
In this study, we developed and externally validated a score-based model to identify the high-risk individuals for esophageal high-grade lesions in Chinese opportunistic screening. The model comprises eight predictors on a 21-point scale, with a score of 9 serving as the cut-off value, indicating high-risk individuals (Figure S2). The model could detect 70.0%, 81.3%, and 81.1% of HGIN, early ESCC, and advanced ESCC, respectively, with a specificity of 76.4%. Additionally, the score-based model could result in 75.6% fewer individuals subjected to endoscopy. Previous studies of ESCC risk stratification models were usually based on a general population, overlooking the development of risk stratification models appropriate for an opportunistic population [20,21,22,23]. The risk stratification model developed by Liu et al. was based on an opportunistic population. However, this model was calculated by a logistic regression algorithm, which is not easily and quickly calculable by users. Additionally, Liu et al.’s model incorporated alarm symptoms as predictors, which may impede the model’s capacity to detect early lesions. To our knowledge, the model in this study was the first score-based model constructed for an opportunistic population, exhibiting superior discriminatory capacity for different grades of esophageal malignancy lesions. In this study, particular emphasis was placed on the identification of early lesions, with a specific focus on lesions that could be endoscopically resected in a curative manner. This approach has the potential to enhance the long-term prognosis and quality of survival for patients. Additionally, the score-based model demonstrated robust discriminative efficacy on both the training cohort and the external validation cohort, comprising a significantly heterogeneous population. This indicated that the model may have general clinical applications. The model could facilitate the assessment of the patient’s risk of ESCC during opportunistic screening, thereby informing endoscopic referral decisions. Furthermore, it could alert endoscopists to individuals at high risk of ESCC.
It is recommended that the range of ESCC screening modalities be expanded to facilitate more effective early detection and treatment. Currently, organized population screening has yielded favorable outcomes in several high-risk regions within China [23,24]. However, the potential for nationwide expansion is constrained by the vast population size and the relatively limited availability of medical resources. It can be reasonably proposed that the introduction of opportunistic screening can serve to complement the existing screening system for ESCC. Additionally, the target population for opportunistic screening is typically characterized by a higher level of compliance. Furthermore, the score-based risk stratification model could assist physicians from different specialties in making standardized referral decisions and facilitate the implementation of opportunistic screening in a manner that is both straightforward and efficient. In addition, the utilization of score-based scales for patient self-assessment has the potential to enhance patient motivation to engage in screening activities. Compared to previous studies [17,25], the opportunistic screening population included in this study was larger, covered a wider geographic area, and included more precancerous lesions and early ESCC. This could enhance the extrapolation of the study and the potential for clinical applications.
The majority of the eight predictors included in the model constructed for this study have been previously identified as risk factors for ESCC [21]. Furthermore, a variety of methods were employed to guarantee the precision of data collection, including the precise delineation of risk factors, the provision of training for researchers, and other related strategies. Nevertheless, there was a possibility of inaccuracies in the information provided on specific factors, including the quantification of alcohol intake and the temperature of the food. The predictors in the final model are characterized by easy-to-collect information and high accuracy, thus ensuring a high degree of reliability. Predictors such as cigarette smoking, pickled food preference, and family history have been used in previous predictive models for ESCC and are proven risk factors for ESCC [21]. Alcohol drinking, alcohol flushing, hot food preference, and education level were excluded from the final model because they were not significant in the multivariable logistic regression model. In addition, residence was included as a predictor in this study, a factor that has rarely been used in previous studies [20,22,23]. This may be because the populations included in the previous studies were all from high-risk rural areas. Furthermore, this study was the first to include tooth loss as a predictor. Some studies have demonstrated that tooth loss is an indicator of oral microbial dysbiosis, which can subsequently lead to an increased risk of developing ESCC [26,27,28]. In this study, individuals presenting with alarming symptoms were excluded, as these individuals may already have ESCC and thus require endoscopy, which would diminish the potential benefit of early diagnosis and screening from the prediction model [29].
By reviewing previous studies, risk stratification models that use only epidemiological factors to achieve similar diagnostic efficacy have been shown to be sufficiently superior [20,21,22,23,24,25]. The objective of this study is to devise a risk stratification scale that is both rapid and straightforward to utilize, with the aim of facilitating the rapid referral of opportunistic populations. In the pursuit of enhancing the diagnostic efficacy of the model, there is a potential necessity to consider incorporating diagnostic molecular markers or advanced imaging techniques in subsequent studies [12,13,30,31]. Furthermore, the validation of the model on more diverse or population-based cohorts is necessary. In subsequent studies, we will collaborate with villages or communities to invite populations to participate in an ESCC screening study with endoscopy by message, and to send them an invitation to self-assess using the scale in this study. Therefore, the scale’s diagnostic efficacy in a general population could be assessed.
Furthermore, the cut-off score in this study could be adaptable, according to clinically applicable scenarios. For instance, the cut-off scores are appropriately adjusted downward in high-risk and medically adequate areas to increase sensitivity. In low-risk areas, the cut-off scores can be appropriately adjusted upwards to ensure a lower false-positive rate. However, the cut-off scores’ adjustment strategy must be further explored and validated in future studies. Nevertheless, irrespective of the cut-off scores selected, it is inevitable that some of the early lesions will be overlooked. Consequently, individuals considered to be at high risk, who are capable of doing so, should be advised to undergo upper gastrointestinal endoscopy at the appropriate time [32].
The present study also has several potential limitations. Firstly, despite the large sample size and nationwide recruitment from 39 centers, it remains challenging to provide a comprehensive representation of the opportunistic population. Secondly, there is a possibility of recall and self-reporting bias in questionnaires, despite the study having been clearly defined and standardized training being conducted. Furthermore, the study population was not followed up, thus precluding the determination of the incidence of ESCC in high-scoring individuals in future years.
5. Conclusions
We have developed a score-based risk prediction model for ESCC and precancerous lesions based on eight epidemiological factors in the opportunistic population. The model demonstrated a high degree of accuracy in its predictive capabilities, and its performance has been validated in an independent population. The study yielded an accessible tool for clinical practice that could assist physicians of different specialties in making referral decisions and self-assessment for the risk of ESCC without additional financial burden.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bray F. Laversanne M. Sung H. Ferlay J. Siegel R.L. Soerjomataram I. Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries CA: A Cancer J. Clin.20247422926310.3322/caac.2183438572751 · doi ↗ · pubmed ↗
- 2Teng Y. Xia C. Cao M. Yang F. Yan X. He S. Cao M. Zhang S. Li Q. Tan N. Esophageal cancer global burden profiles, trends, and contributors Cancer Biol. Med.20242165666610.20892/j.issn.2095-3941.2024.014539066471 PMC 11359494 · doi ↗ · pubmed ↗
- 3Qi L. Sun M. Liu W. Zhang X. Yu Y. Tian Z. Ni Z. Zheng R. Li Y. Global esophageal cancer epidemiology in 2022 and predictions for 2050: A comprehensive analysis and projections based on GLOBOCAN data Chin. Med. J.20241373108311610.1097/CM 9.000000000000342039668405 PMC 11706580 · doi ↗ · pubmed ↗
- 4Wu Y. He S. Cao M. Teng Y. Li Q. Tan N. Wang J. Zuo T. Li T. Zheng Y. Comparative analysis of cancer statistics in China and the United States in 2024 Chin. Med. J.20241373093310010.1097/CM 9.000000000000344239654104 PMC 11706596 · doi ↗ · pubmed ↗
- 5Kamangar F. Nasrollahzadeh D. Safiri S. Sepanlou S.G. Fitzmaurice C. Ikuta K.S. Bisignano C. Islami F. Roshandel G. Lim S.S. The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017 Lancet Gastroenterol. Hepatol.2020558259710.1016/S 2468-1253(20)30007-832246941 PMC 7232026 · doi ↗ · pubmed ↗
- 6Yang H. Wang F. Hallemeier C.L. Lerut T. Fu J. Oesophageal cancer Lancet 20244041991200510.1016/S 0140-6736(24)02226-839550174 · doi ↗ · pubmed ↗
- 7Morgan E. Soerjomataram I. Rumgay H. Coleman H.G. Thrift A.P. Vignat J. Laversanne M. Ferlay J. Arnold M. The Global Landscape of Esophageal Squamous Cell Carcinoma and Esophageal Adenocarcinoma Incidence and Mortality in 2020 and Projections to 2040: New Estimates From GLOBOCAN 2020 Gastroenterology 2022163649658.e 210.1053/j.gastro.2022.05.05435671803 · doi ↗ · pubmed ↗
- 8Han B. Zheng R. Zeng H. Wang S. Sun K. Chen R. Li L. Wei W. He J. Cancer incidence and mortality in China, 2022 J. Natl. Cancer Cent.20244475310.1016/j.jncc.2024.01.00639036382 PMC 11256708 · doi ↗ · pubmed ↗
