# Observer: creation of a novel multimodal dataset for outpatient care research

**Authors:** Kevin B Johnson, Basam Alasaly, Kuk Jin Jang, Eric Eaton, Sriharsha Mopidevi, Ross Koppel

PMC · DOI: 10.1093/jamia/ocaf182 · 2025-10-27

## TL;DR

Observer is a new dataset combining video recordings, EHR data, and surveys from outpatient visits to support research and education in primary care.

## Contribution

The paper introduces Observer, a novel multimodal dataset for outpatient care research with privacy-preserving design and real-world clinical interactions.

## Key findings

- The first 100 visits involved 13 PCPs from 4 clinics, with a 61% patient consent rate.
- High satisfaction scores were reported by both patients and PCPs regarding the recording process.
- Communication patterns were influenced by room layout and camera placement, now included in the dataset.

## Abstract

To support ambulatory care innovation, we created Observer, a multimodal dataset comprising videotaped outpatient visits, electronic health record (EHR) data, and structured surveys. This paper describes the data collection procedures and summarizes the clinical and contextual features of the dataset.

A multistakeholder steering group shaped recruitment strategies, survey design, and privacy-preserving design. Consented patients and primary care providers (PCPs) were recorded using room-view and egocentric cameras. EHR data, metadata, and audit logs were also captured. A custom de-identification pipeline, combining transcript redaction, voice masking, and facial blurring, ensured video and EHR HIPAA compliance.

We report on the first 100 visits in this continually growing dataset. Thirteen PCPs from 4 clinics participated. Recording the first 100 visits required approaching 210 patients, from which 129 consented (61%), with 29 patients missing their scheduled encounter after consenting. Visit lengths ranged from 5 to 100 minutes, covering preventive care to chronic disease management. Survey responses revealed high satisfaction: 4.24/5 (patients) and 3.94/5 (PCPs). Visit experience was unaffected by the presence of video recording technology.

We demonstrate the feasibility of capturing rich, real-world primary care interactions using scalable, privacy-sensitive methods. Room layout and camera placement were key influences on recorded communication and are now added to the dataset. The Observer dataset enables future clinical AI research/development, communication studies, and informatics education among public and private user groups.

Observer is a new, shareable, real-world clinic encounter research and teaching resource with a representative sample of adult primary care data.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12844583/full.md

---
Source: https://tomesphere.com/paper/PMC12844583