# Assessing the Role of Medical Caption Technology to Support Physician-Patient Communication for Patients With Hearing Loss: Mixed Methods Pilot Study

**Authors:** Sarah E Hughes, Liang-Yuan Wu, Lindsay J Ma, Dhruv Jain, Michael M McKee

PMC · DOI: 10.2196/79073 · JMIR Rehabilitation and Assistive Technologies · 2026-01-15

## TL;DR

This study explores how real-time captioning technology can help deaf and hard-of-hearing patients communicate better with doctors during medical visits.

## Contribution

The study introduces a tailored speech recognition system for clinical settings and evaluates its usability and impact on communication for DHH individuals.

## Key findings

- Most participants found the captions easy to follow and trustworthy during mock medical visits.
- Participants reported feeling more confident and less anxious about missing critical medical information.
- The system's word error rate was consistent with standard automated speech recognition benchmarks.

## Abstract

Speech recognition technology is widely used by individuals who are Deaf/deaf and hard-of-hearing (DHH) in everyday communication, but its clinical applications remain underexplored. Communication barriers in health care can compromise safety, understanding, and autonomy for individuals who are DHH.

This study aimed to evaluate a real-time speech recognition system (SRS) tailored for clinical settings, examining its usability, perceived effectiveness, and transcription accuracy among users who are DHH.

We conducted a pilot study with 10 adults who are DHH participating in mock outpatient encounters using a custom SRS powered by Google’s speech-to-text application programming interface. We used a convergent parallel mixed-methods design, collecting quantitative usability ratings and qualitative interview data during the same study session. These datasets were subsequently merged and jointly interpreted. Participants completed postscenario surveys and structured exit interviews assessing distraction, trust, ease of use, satisfaction, and emotional response. Caption accuracy was benchmarked against professional communication access real-time translation transcripts using word error rate (WER). Because WER assigns equal weight to all tokens, it does not differentiate between routine transcription errors and those involving safety-critical clinical terms (eg, medications or diagnoses). Therefore, WER may underestimate the potential impact of certain errors in medical contexts.

Across 29 clinical scenario simulations, 86% (25/29) of participants found captions nondistracting, 90% (26/29) reported them easy to follow and trustworthy, and 76% (22/29) were satisfied with the experience. Participants described the SRS as intuitive, emotionally grounding, and preferable to lip reading in masked settings. WER ranged from 12.7% to 22.8%, consistent with benchmarks for automated SRSs. Interviews revealed themes of increased confidence in following clinical conversations and staying engaged despite masked communication. Participants reported less anxiety about missing critical medical information and expressed a strong interest in expanding the tool to real-world settings, especially for older adults or those with cognitive impairments.

Our findings support the potential of real-time captioning to enhance accessibility and reduce the cognitive and mental burden of communication for individuals who are DHH in clinical care. Participants described the SRS as both functionally effective and personally empowering. While accuracy for complex medical terminology remains a limitation, participants consistently expressed trust in the system and a desire for its integration into clinical care. Future research should explore real-world implementation, domain-specific optimization, and the development of user-centered evaluation metrics that extend beyond transcription fidelity to include trust, autonomy, and communication equity.

## Full-text entities

- **Diseases:** Hearing Loss (MESH:D034381), DHH (MESH:D003638), hard-of-hearing (MESH:D018804), cognitive impairments (MESH:D003072), anxiety (MESH:D001007)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12806592/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12806592/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12806592/full.md

---
Source: https://tomesphere.com/paper/PMC12806592