# Temporal Annotation of German Clinical Language in Real and Synthetic Clinical Documents: Corpus Development and Baseline Tagger Validation Study

**Authors:** Luise Modersohn, Udo Hahn

PMC · DOI: 10.2196/71458 · Journal of Medical Internet Research · 2026-02-25

## TL;DR

This paper introduces the first TimeML-compliant temporal annotation scheme for German clinical language, creating annotated corpora and baseline taggers for temporal information extraction.

## Contribution

The paper presents the first publicly accessible, temporally annotated clinical corpus for German and a TimeML-compliant annotation scheme tailored to German clinical language.

## Key findings

- A TimeML-compliant annotation schema was developed for German clinical language with high interannotator agreement (F1-score of 0.9) for temporal named entities.
- The GraSCCo-temp corpus is the first publicly available, temporally annotated German clinical dataset.
- Baseline taggers achieved F1-scores between 0.64 and 0.85 for temporal named entity recognition.

## Abstract

Temporal information about patients constitutes a precious source for clinical decision-making and medical treatment. The automatic extraction of such data from unstructured clinical narratives requires time-annotated clinical reports and notes from which time-informed taggers can be learned. Unfortunately, the non-English clinical language community, the German one as a typical example, with only a few exceptions, generally lacks such time-annotated resources to train and evaluate temporal taggers.

To overcome this metadata bottleneck, we developed a TimeML-conformant annotation schema for both temporal entities and temporal relations adapted to the needs of German medical language. Based on the annotations derived therefrom, we trained state-of-the-art baseline taggers to recognize temporal expressions in clinical documents.

Starting from temporal annotation guidelines for English clinical documents, we developed preliminary annotation guidelines for temporal named entities and temporal relations for the German language. These guidelines were subsequently refined and adapted to German clinical jargon, incorporating the work experience of 5 clinically trained annotators (students of medicine). For this task, we used randomly selected smaller subsets of 2 German clinical corpora—a real-world one (3000PAJ) and a synthetic one (GraSCCo). Both corpora were annotated (3000PAJ partially, GraSCCo completely), randomly selecting 10% of the documents as an agreement part on 3000PAJ. To measure interannotator agreement (IAA), we computed pairwise F1-scores. We used that metadata to develop BERT (Bidirectional Encoder Representations from Transformers)-based language models for the creation of time-sensitive baseline taggers. All annotations are based on TimeML, the international de facto standard for time information markup.

We created 3000PAJ-temp, a time-annotated corpus of real clinical documents (which cannot be distributed because of the rigid privacy legislation enforced for German clinical data), and GraSCCo-temp, a synthetic one (which is publicly available without any restrictions). Based on the final guidelines, we achieved an IAA F1-score of 0.9 on both corpora for the temporal named entity recognition task. For the temporal relation extraction task, the IAA on GraSCCo plummeted to an F1-score of 0.57 and 0.41 on 3000PAJ, respectively. Still, those results are comparable with English clinical datasets. Our baseline tagger for named entities reached F1-scores between 0.64 and 0.85. For automatic relation extraction, we achieved F1-scores ranging between 0.60 and 0.64.

We here introduce the first TimeML-compliant annotation scheme for time expressions occurring in German clinical language and apply it to 2 clinical corpora, one with nondistributable real clinical data, the other with distributable synthetic ones. The latter constitutes the first publicly accessible, temporally annotated clinical corpus for the German language. The time tagger trained on these datasets is the first of its kind, fully compliant with the TimeML markup language. The amounts of temporal metadata in our corpora are among the largest datasets ever produced for the clinical domain, both compared with English and German predecessors.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12980054/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12980054/full.md

## References

108 references — full list in the complete paper: https://tomesphere.com/paper/PMC12980054/full.md

---
Source: https://tomesphere.com/paper/PMC12980054