Diagnostic accuracy of AI-assisted chest radiographs in tuberculosis screening: A Ghanaian clinical study

Derick Seyram Sule; Kofi Adesi Kyei; William Kwadwo Antwi; Godwill Acquah; Klenam Dzefi-Tettey; Joseph Daniels; Andrew Yaw Nyantakyi

PMC · DOI:10.1371/journal.pone.0342988·March 27, 2026

Diagnostic accuracy of AI-assisted chest radiographs in tuberculosis screening: A Ghanaian clinical study

Derick Seyram Sule, Kofi Adesi Kyei, William Kwadwo Antwi, Godwill Acquah, Klenam Dzefi-Tettey, Joseph Daniels, Andrew Yaw Nyantakyi

PDF

Open Access

TL;DR

An AI system was more accurate than a radiologist in detecting tuberculosis from chest X-rays in a Ghanaian study.

Contribution

The study demonstrates AI's superior diagnostic performance over radiologists in TB screening using chest X-rays in a high-burden setting.

Findings

01

The AI system achieved 91% accuracy, outperforming the radiologist's 86% accuracy in TB screening.

02

AI showed higher agreement with GeneXpert MTB/RIF results (κ = 0.79) compared to the radiologist (κ = 0.69).

03

AI's sensitivity (86%) and specificity (93%) were higher than the radiologist's (84% and 87%).

Abstract

Tuberculosis remains a major global health challenge, particularly in resource-limited settings where access to expert radiological interpretation is constrained. Artificial intelligence offers a promising solution to enhance diagnostic accuracy and efficiency in TB screening. This study aimed to evaluate the diagnostic performance of an AI-based system compared to a radiologist in screening for TB using chest X-rays from 1,010 patients. Patients were adults ≥18 years with suspected TB in a high-burden setting. GeneXpert MTB/RIF served as reference to assess accuracy, sensitivity, specificity, PPV, NPV, and AUC for radiologist and AI TB predictions. Comparisons used McNemar’s test and Cohen’s kappa to evaluate agreement and significance of differences. The AI system demonstrated superior performance with an accuracy of 91%, sensitivity of 86%, specificity of 93%, PPV of 85%, NPV of…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

GeneXpert

Diseases5

Tuberculosis TB AI pulmonary abnormalities pulmonary tuberculosis

Figures5

Click any figure to enlarge with its caption.

Fig 2 — ROC Curve comparing radiologist performance and AI system.

Fig 3 — Confusion matrix for radiologist predictions.

Fig 4 — Confusion matrix for AI predictions.

Fig 5 — Distribution of TB cases across the three modalities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Tuberculosis Research and Epidemiology · Artificial Intelligence in Healthcare and Education

Full text

Introduction

Tuberculosis (TB) remains a significant public health challenge globally, particularly in low- and middle-income countries (LMICs) including Ghana [1]. Despite advances in treatment and control strategies, early detection continues to be a bottleneck in TB elimination efforts [2,3]. A Two-Dimensional Chest radiography (2D CXR) is a widely used screening tool due to its accessibility and cost-effectiveness, yet its diagnostic accuracy is often limited by inter-reader variability and limited trained radiologists [4,5].

Advancement in Artificial Intelligence (AI) has introduced promising solutions to enhance the interpretation of chest radiographs. AI-assisted diagnostic tools, particularly those leveraging deep learning algorithms, have demonstrated potential in automating the detection of pulmonary abnormalities consistent with features TB [6–8]. Combining AI with human readers improves diagnostic accuracy. AI-assisted chest radiograph interpretation has gained traction as a supplementary tool in TB screening, with studies reporting as high 92−94% sensitivity and specificity of 98.2−95% [7,9]. These findings underscore the potential of AI to support large-scale screening programs, particularly in LMICs where there is limited radiologist.

Despite these advances, challenges remain. Santosh, Shen, and Zhang (2022) provided an overview of deep learning approaches for TB screening, noting that algorithm performance is influenced by image quality, dataset diversity and training methodology [10]. A further review emphasized the need for contextual validation and integration into clinical workflows [11]. Furthermore, concerns about false positives persist hence the need for robust validation in diverse population. In light of this, there has been increased investment in digital health tools, including AI, to accelerate TB elimination efforts [12,13]. The applicability of AI tools in African contexts is still emerging. Recent innovations have focused on deploying AI tools in real-world, resource-constrained settings with promising results [14,15]. These tools offer rapid, scalable and standardized assessments, which are especially valuable in resource-constrained settings.

In Ghana, the TB case detection rate has decreased [16]. The integration of AI into diagnostic workflows could significantly reduce time to diagnosis and improve screening outcomes in this high prevalent TB region with unevenly distributed radiological expertise [17]. Only a few studies have systematically assessed the diagnostic accuracy, the clinical utility and diagnostic accuracy of these AI-assisted radiography for TB tools in the Ghanaian context remain underexplored. This gap highlights the need for Ghanaian context-specific research to inform policy and clinical practice. This study seeks to bridge the evidence gap and support informed implementation of AI in TB screening by evaluating the performance of AI-assisted 2D CXR interpretation in TB screening compared to radiologist within a Ghanaian clinical setting.

Methods

The study employed a descriptive, retrospective, quantitative cross-sectional design and was conducted at the Family Medicine department of the Korle Bu Teaching Hospital. The department is a major point of entry to the facility and offers a range of basic and limited specialized care. At this department there is a dedicated X-ray unit. The study period spanned from 1^st^ January 2021–1^st^ January 2025, and aimed to assess the performance of a Convolutional Neural Network (CNN)-assisted model in detecting pulmonary tuberculosis (TB) using 2D CXR. Data collection started on 30^th^ March,2024 upon receipt of ethical approval. A total sampling technique was employed during the study period using a consecutive approach. Patients included into the study were >18 years old with microbiological confirmation of TB via GeneXpert MTB/RIF assay results, as recorded by attending physicians in the hospital’s health information management system. Radiographic images taken were posterior-anterior (PA). All radiographs were exported in DICOM format and converted to PNG. Images were resized to 224 × 224 pixels and normalized to enhance contrast and mitigate illumination variability. A ResNet50-based CNN architecture was employed, initialized with weights from the ImageNet dataset and fine-tuned using a combined dataset of publicly available bacteriologically confirmed TB radiographs from the same facility from preceeding years. The CNN model was independently applied to the set of chest radiographs, generating probability scores for tuberculosis (TB) presence.

A classification threshold of 0.5 was used to determine TB-positive and TB-negative predictions. Data augmentation techniques including horizontal flipping, zooming, and rotation were applied to increase model robustness. Radiographic interpretation was based on a retrospective review conducted by certified radiologist with experience in thoracic imaging. The radiologist’s blinded original diagnostic reports documented at the time of clinical care were extracted from the hospital’s radiology information system without modification. These reports served as the human comparator for evaluating the CNN model’s performance. Model performance and radiologist interpretation were both compared against laboratory confirmation for sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC-ROC). Discrepancies between the CNN model and radiologist reports were analyzed to identify patterns of diagnostic divergence.

Statistical comparison was performed between model predictions and expert radiologist interpretations. McNemar’s test assessed differences in paired proportions, and Cohen’s kappa measured agreement. Confidence interval was computed at the 95% confidence level, with statistical significance set at p < 0.05, using SPSS v26. Ethical approval was obtained from the Ethics and Protocol Review Committee of the School of Biomedical and Allied Health Sciences, College of Health Sciences, University of Ghana. Secondary data was collected from patient hospital files hence consent to participate from patient was waived by ethical committee. Declaration of Helsinki, data confidentiality and patient anonymity were strictly adhered to and maintained throughout the study by removal of all patient identifiers and replacing them with unique study numbers. The framework of the study is depicted in Fig 1.

Framework of the study.

Results

Patient screening overview

A total of 1,010 patients that met the inclusion criteria were sampled during the period. The Gold standard of diagnosis in this study is laboratory confirmation of TB. In 323 patients, the test was positive and negative in 687 patients. Both radiologist positive imaging interpretations and AI prediction (357 and 327 patients respectively) were high compared to laboratory confirmation. However, the Ai prediction was closer to that of laboratory confirmation (327 versus 323 patients respectively) as compared to Radiologist review (357 versus 323 patients respectively). This depicted in Table 1. In the subgroup of patients deemed positive by Radiologist review, 323 were truly positive for TB by laboratory confirmation and the remaining 34 falsely positive (thus negative on laboratory confirmation. For the AI prediction, 301 and 26 patients were truly positive and falsely positive respectively on laboratory confirmation as shown in Table 2.

Table 1: Distribution of TB Status by Laboratory Confirmation, Radiologist, and AI Prediction.

Table 2: Radiologist and AI Detection Outcomes Stratified by GeneXpert MTB/RIF Results.

The AI system shows a higher sensitivity (86%), specificity (93%) PPV (85%), NPV (94%) and AUC (90%) for AI prediction compared with that of radiologist, indicating superior classification ability in distinguishing TB-positive from TB-negative cases as shown in Table 3. The AI system shows a higher AUC, indicating superior classification ability in distinguishing TB-positive from TB-negative cases as depicted in Fig 2.

Table 3: Diagnostic Performance of Radiologist and AI Predictions Compared with GeneXpert MTB/RIF Reference Standard.

ROC Curve comparing radiologist performance and AI system.

Statistical comparison using McNemar’s test

To evaluate the significance of differences McNemar’s test of significance of differences in paired proportions between radiologist and AI predictions showed a t- 9.46 with p = 0.0021

Calculated Cohen’s kappa to measure the level of agreement between the GeneXpert MTB/RIF result, radiologist, and AI predictions are shown in Table 4. The agreement between AI predictions and GeneXpert MTB/RIF result (κ = 0.79) is higher than that of the radiologist (κ = 0.69). Also, the agreement between radiologist and AI predictions was moderate (κ = 0.53).

Table 4: Cohen’s kappa measured the level of agreement between the GeneXpert MTB/RIF result, radiologist and AI predictions.

Discussion

This study evaluated the diagnostic performance of a radiologist and an AI-based system in screening for tuberculosis (TB) using a dataset of 1,010 patients. The results demonstrate that the AI system is comparable to the radiologist report across multiple performance metrics (Figs 2–5), including accuracy, sensitivity, specificity, PPV, NPV, and Cohen’s kappa agreement with the GeneXpert MTB/RIF reference standard. The AI system achieved an accuracy of 91%, sensitivity of 86%, and specificity of 93%, compared to the radiologist’s accuracy of 86%, sensitivity of 84%, and specificity of 87%. Also, Cohen’s kappa was used to measure inter-rater agreement. The agreement between the AI prediction and GeneXpert MTB/RIF result was substantial (κ = 0.79), higher than that of the radiologist (κ = 0.69). The moderate agreement (κ = 0.53) between the radiologist and AI predictions suggests that the two modalities may differ in their diagnostic approach, with the AI system potentially offering complementary insights. The significant result from McNemar’s test (p = 0.0021) further indicates that the AI system’s performance is not equivalent to that of the radiologist and may represent a meaningful improvement in TB screening accuracy. The ROC curves illustrate each modality’s ability to distinguish TB-positive from TB-negative cases. The AI system achieved a higher area under the curve (AUC = 0.90) compared with the radiologist (AUC = 0.86), indicating superior classification accuracy in distinguishing TB-positive from TB-negative cases as depicted in Fig 2. Furthermore, AI system’s false positive rate of 4.3% (Fig 3) is lower than that of radiologist with a rate of 8.5% (Fig 4).

Confusion matrix for radiologist predictions.

Confusion matrix for AI predictions.

Distribution of TB cases across the three modalities.

The higher PPV and NPV values further support the AI’s robustness in clinical decision-making, particularly in settings where minimizing false positives and false negatives is critical. By serving as a triage tool, it can flag suspicious cases for prompt review, thereby reducing diagnostic delays and optimizing workflow efficiency at areas where there is a limited number of radiologist and or high patient volumes. Although the observed difference in accuracy between the two approaches is significant, its clinical relevance within Ghanaian tuberculosis screening programmes requires careful consideration. Laboratory testing in this study identified 32% TB-positive cases, while both the radiologist (35.3%) and AI (32.4%) predicted slightly higher proportions, with the radiologist reporting the highest (Fig 5). While the improvements in accuracy and specificity suggest a potential reduction in false-positive diagnoses (Fig 4) and unnecessary follow-up investigations, the modest gain in sensitivity may result in only a limited reduction in missed TB cases. In a high-burden setting such as Ghana, sensitivity is particularly critical as false-negative results can delay treatment initiation and contribute to ongoing transmission. Furthermore, the practical impact of these performance gains must be considered alongside contextual factors, including implementation costs of AI, infrastructure requirements, workforce capacity, and compatibility with existing radiology and TB control program workflows. Therefore, although the observed differences are statistically significant, their true clinical and public health value depends on whether the incremental improvements translate into meaningful enhancements in case detection, resource utilization, and overall TB control outcomes in Ghana.

These findings corroborate studies conducted in other regions of the world. Studies from across the globe consistently shows that AI systems for TB detection can perform at or above radiologist level, with variability largely attributable to reference standards, prevalence rates, and imaging protocols. To elaborate, in Zambia, CAD4TB was evaluated and reported a sensitivity of 88% and specificity of 75% against GeneXpert MTB/RIF, showing comparable performance to human readers in community screening [18]. The Western world documents AUC of above 0.90 whiles in Asia, AI AUC values fell between 0.951 and 0.975 with sensitivity and specificity above 85% [10,19,20]. These findings mirror our study outcomes, reinforcing the reliability and generalizability of AI models when locally calibrated. This study contributes to this global literature by demonstrating measurable improvements in diagnostic accuracy in Ghana, highlighting the potential impact of AI deployment in similar resource-limited settings.

These results underscore the feasibility of deploying AI-assisted radiography in Ghana. The integration of AI into radiographic workflows offers significant advantages in resource-limited settings, limited radiologist and high patient volumes can delay diagnosis [21]. AI-assisted interpretation can serve as a preliminary screening tool, flagging suspicious cases for further review and potentially accelerating the diagnostic process [22]. Moreover, the use of AI may help standardize interpretations across institutions, improving diagnostic equity and reducing the burden on stressed healthcare professionals. This is particularly relevant in rural and underserved areas where access to expert radiologists is very limited [23,24]. The tool’s offline functionality and rapid image processing make it particularly suitable for decentralized screening programs, mobile clinics and community healthcare initiatives [25].

However, several considerations must be addressed before widespread implementation. First, algorithmic performance may vary based on image acquisition protocols, patient demographics and disease prevalence [26–28]. Local calibration and continuous validation are essential to improve and ensure sustained accuracy and reduce algorithm bias [26]. Also, integration into existing health system requires training, infrastructure support and regulatory oversight to safeguard patient data and diagnostic integrity [29].

Implications

In resource-limited settings, where access to expert radiologists exists, AI systems can serve as effective triage tools, reducing diagnostic delays and improving patient outcomes. The AI can augment clinical expertise by flagging high risk images for radiologist expediated review and laboratory confirmation.

Limitations

The dataset of 1,010 patients with specific image requirement were derived from a single institution raising the issue of dataset and selection bias. This limits generalizability across different populations, imaging equipment, and clinical contexts (both rural and urban cultural settings) within the country. Secondly, GeneXpert MTB/RIF results derived from clinical files were used as the reference standard. Although this test is sensitive and specific, it does not capture all cases of active tuberculosis, and misclassifications inherent in the test may have influenced performance estimates. Additionally, the CNN was fine-tuned using publicly available TB chest X-ray datasets some of which were from the Ghanaian setting, however this is still dilute and lacks full representation of Ghanaian patients’ demographics, radiographic patterns, comorbidities, body habitus, skin tones, and imaging variability, potentially causing algorithmic bias, domain shift, and performance degradation in local contexts.

Finally, performance metrics such as positive and negative predictive values are influenced by disease prevalence, and may not directly translate to settings with different epidemiological profiles. These limitations underscore the need for multicenter validation studies, incorporation of diverse patient populations and cultures with continuous local calibration specific to the Ghanaian population to ensure sustained accuracy and equitable deployment of AI-assisted screening tools.

Conclusion

AI-assisted chest radiograph interpretation offers a promising solution to enhance early TB screening in resource-limited settings like Ghana. Its high diagnostic accuracy, scalability, and consistency position’s it as a valuable tool in the fight against TB. Strategic implementation, supported by local validation and policy frameworks, could significantly improve early detection and reduce TB-related morbidity and mortality.

Supporting information

S1 DataSupplemetary file data.(CSV)

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Kwabla MP, Amuasi JH, Krause G, Dormechele W, Takramah W, Kye-Duodu G, et al. Completeness of tuberculosis case notification in Ghana: record linkage and capture-recapture analysis of three TB registries. BMC Infect Dis. 2025;25(1):206. doi: 10.1186/s 12879-025-10622-1 39934682 PMC 11817754 · doi ↗ · pubmed ↗
2Yayan J, Franke K-J, Berger M, Windisch W, Rasche K. Early detection of tuberculosis: a systematic review. Pneumonia (Nathan). 2024;16(1):11. doi: 10.1186/s 41479-024-00133-z 38965640 PMC 11225244 · doi ↗ · pubmed ↗
3Shah H, Patel J, Rai S, Sen A. Advancing tuberculosis elimination in India: a qualitative review of current strategies and areas for improvement in TB preventive treatment. IJID Reg. 2024;14:100556.39866845 10.1016/j.ijregi.2024.100556 PMC 11761892 · doi ↗ · pubmed ↗
4Qin ZZ, Ahmed S, Sarker MS, Paul K, Adel ASS, Naheyan T, et al. Tuberculosis detection from chest X-rays for triaging in a high tuberculosis-burden setting: an evaluation of five artificial intelligence algorithms. Lancet Digit Health. 2021;3(9):e 543–54. doi: 10.1016/S 2589-7500(21)00116-3 34446265 · doi ↗ · pubmed ↗
5Kamal R, Singh M, Roy S, Adhikari T, Gupta AK, Singh H, et al. A comparison of the quality of images of chest X-ray between handheld portable digital X-ray & routinely used digital X-ray machine. Indian J Med Res. 2023;157(2 & 3):204–10. doi: 10.4103/ijmr.ijmr_845_22 37202939 PMC 10319375 · doi ↗ · pubmed ↗
6Lim WH, Kim H. Application of artificial intelligence in thoracic radiology: a narrative review. Tuberc Respir Dis (Seoul). 2025;88(2):278–91. doi: 10.4046/trd.2024.0062 39689720 PMC 12010722 · doi ↗ · pubmed ↗
7Han ZL, Zhang YY, Li J, Gao S, Liu W, Yang WJ. A systematic review and meta-analysis of artificial intelligence software for tuberculosis diagnosis using chest X-ray imaging. J Thorac Dis. 2025;17:3223–37.40529749 10.21037/jtd-2025-604PMC 12170011 · doi ↗ · pubmed ↗
8de Camargo TFO, Ribeiro GAS, da Silva MCB, da Silva LO, Torres PPTES, Rodrigues D do S da S, et al. Clinical validation of an artificial intelligence algorithm for classifying tuberculosis and pulmonary findings in chest radiographs. Front Artif Intell. 2025;8:1512910. doi: 10.3389/frai.2025.1512910 39991462 PMC 11843218 · doi ↗ · pubmed ↗