# Towards Automatic Generation of Shareable Synthetic Clinical Notes Using   Neural Language Models

**Authors:** Oren Melamud, Chaitanya Shivade

arXiv: 1905.07002 · 2019-05-23

## TL;DR

This paper proposes using neural language models to generate synthetic clinical notes that balance privacy preservation with utility for NLP tasks, aiming to facilitate data sharing while protecting patient confidentiality.

## Contribution

It introduces a method for automatically generating synthetic clinical notes using neural models trained on de-identified data, enhancing data sharing capabilities.

## Key findings

- Synthetic notes have utility close to real data in some NLP tasks.
- Generated notes offer improved privacy preservation.
- Room for future improvements in note quality and privacy balance.

## Abstract

Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.07002/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1905.07002/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1905.07002/full.md

---
Source: https://tomesphere.com/paper/1905.07002