# From News to Medical: Cross-domain Discourse Segmentation

**Authors:** Elisa Ferracane, Titan Page, Junyi Jessy Li, Katrin Erk

arXiv: 1904.06682 · 2019-04-16

## TL;DR

This paper explores how well discourse segmentation models trained on news data perform on medical texts, highlighting challenges and potential solutions for cross-domain adaptation.

## Contribution

It introduces a new annotated medical corpus and analyzes cross-domain segmentation performance, revealing domain-specific challenges and avenues for improving segmentation models.

## Key findings

- Performance drops when applying news-trained segmenters to medical texts
- Some segmentation errors can be addressed with pipeline improvements
- Expanding the corpus is necessary to learn medical domain nuances

## Abstract

The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.06682/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1904.06682/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1904.06682/full.md

---
Source: https://tomesphere.com/paper/1904.06682