# A Bilingual Arabic-English Ambient AI Scribe for Clinical Documentation: Prospective Evaluation Study

**Authors:** Umair Tahir Khan, Ammar Tahir Khan, Waleed Aljaadi, Razan Alhadlaq, Zahran Baqashmer, Yasin Alsafi, Yousef Alomran, Maha Al Rusaiyes, Muaddiyah Radif, Tahir Naeem Khan, Saleh Abdullah Saleh Altamimi

PMC · DOI: 10.2196/83335 · JMIR Medical Informatics · 2026-03-24

## TL;DR

This study evaluates a bilingual Arabic-English AI scribe, Sahl AI, showing it can produce high-quality clinical notes and reduce physician documentation burden.

## Contribution

The first end-to-end evaluation of a bilingual ambient AI scribe in a low-resource language like Arabic.

## Key findings

- Sahl AI achieved high accuracy scores in both Arabic and English clinical note generation.
- Physicians reported that Sahl AI notes were comprehensive and could reduce burnout.
- The system showed strong internal consistency and comprehensibility in evaluations.

## Abstract

Medical ambient artificial intelligence (AI) scribes reduce documentation burden, but the current evidence is almost entirely from English systems. In the Arabic-speaking world, physicians converse mainly in Arabic and write clinical notes in English, adding cognitive burden. Due to scarce corpora in the Arabic language, the development of Arabic-enabled AI speech technologies has been challenging. Here, we address this gap by developing and evaluating a bilingual Arabic-English medical AI scribe.

This study aims to evaluate the feasibility and performance of a bilingual medical Arabic-English ambient AI scribe, Sahl AI, using a full end-to-end methodology from raw audio to clinical note.

A prospective, single-arm feasibility pilot study was conducted in 2 stages across outpatient clinics, inpatient services, and primary care clinics within Riyadh First Health Cluster: development and implementation. In stage 1 (development), consultation audios were collected and manually annotated to fine-tune the AI pipeline. Version 1 of Sahl AI underwent feasibility evaluation using a modified 9-item Physician Documentation Quality Instrument (PDQI-9) framework based on Likert-scale ratings. Subsequently, in stage 2 (implementation), the AI pipeline was fine-tuned, and version 2 was evaluated as a real-world deployment in family medicine clinics. Independent reviewers evaluated clinical notes. Physician experience was captured using targeted surveys.

During stage 1, the AI pipeline was fine-tuned, producing version 1 of Sahl AI, which was tested in 64 clinical encounters for technical feasibility. This evaluation yielded a mean modified PQDI-9 score of 42.2 (SD 2.98) out of 45, with the model scoring 4.35 (SD 0.82) out of 5 in the accuracy domain. Following further optimization, version 2 of Sahl AI was evaluated during stage 2, using 55 real-world consultations assessed by 2 independent physician evaluators. Sahl AI achieved mean modified PDQI-9 scores of 42.4 (SD 1.84) out of 45 for Arabic and 37.8 (SD 1.10) out of 40 for English. On the Likert scale, for the accuracy domain, the mean score was 4.53 (SD 0.46) out of 5 for Arabic compared with 4.77 (SD 0.37) out of 5 for English (P=.06). Internal consistency (mean 4.94, SD 0.17) and comprehensibility (mean 4.89, SD 0.21) were the top-rated domains. The targeted survey of a larger cohort of 22 family medicine physicians using Sahl AI showed that most responding physicians agreed that notes were comprehensive and perceived potential time savings and reduced burnout.

Sahl AI, a bilingual Arabic-English medical ambient AI scribe, generates accurate and high-quality notes, and in targeted postimplementation surveys, most physicians agreed that it produced comprehensive notes and perceived potential benefits for time savings and stress or burnout. This provides the first empirical basis for rigorous end-to-end ambient AI scribe evaluation in low-resource languages such as Arabic.

## Full-text entities

- **Diseases:** asthma (MESH:D001249), cognitive impairment (MESH:D003072), allergy (MESH:D004342), chest pain (MESH:D002637), AI (MESH:C538142), hallucination (MESH:D006212), abdominal pain (MESH:D015746), burnout (MESH:D002055), visual loss (MESH:D014786)
- **Chemicals:** alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13012219/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13012219/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC13012219/full.md

---
Source: https://tomesphere.com/paper/PMC13012219