# Is artificial data useful for biomedical Natural Language Processing   algorithms?

**Authors:** Zixu Wang, Julia Ive, Sumithra Velupillai, Lucia Specia

arXiv: 1907.01055 · 2019-08-09

## TL;DR

This paper investigates the utility of artificially generated biomedical clinical text for improving NLP algorithms, demonstrating that synthetic data can enhance performance or replace real data in key tasks.

## Contribution

It introduces a generic methodology for generating clinical text with key phrases and evaluates its effectiveness in biomedical NLP tasks.

## Key findings

- Artificial data improves neural network performance in text classification.
- Synthetic data can fully replace real training data in some NLP setups.
- Generated data boosts performance when combined with real data.

## Abstract

A major obstacle to the development of Natural Language Processing (NLP) methods in the biomedical domain is data accessibility. This problem can be addressed by generating medical data artificially. Most previous studies have focused on the generation of short clinical text, and evaluation of the data utility has been limited. We propose a generic methodology to guide the generation of clinical text with key phrases. We use the artificial data as additional training data in two key biomedical NLP tasks: text classification and temporal relation extraction. We show that artificially generated training data used in conjunction with real training data can lead to performance boosts for data-greedy neural network algorithms. We also demonstrate the usefulness of the generated data for NLP setups where it fully replaces real training data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01055/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01055/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1907.01055/full.md

---
Source: https://tomesphere.com/paper/1907.01055