Fake it till you predict it: data augmentation strategies to detect   initiation and termination of oncology treatment

Valentin Pohyer (HEGP AP-HP); Elizabeth Fabre; St\'ephane Oudard,; Laure Fournier; Bastien Rance

arXiv:2410.10271·q-bio.QM·October 15, 2024

Fake it till you predict it: data augmentation strategies to detect initiation and termination of oncology treatment

Valentin Pohyer (HEGP AP-HP), Elizabeth Fabre, St\'ephane Oudard,, Laure Fournier, Bastien Rance

PDF

TL;DR

This paper introduces a method combining pattern recognition, dictionaries, transformer models, and data augmentation to extract treatment initiation and termination details from free-text oncology reports, improving information retrieval in clinical settings.

Contribution

The study presents a novel data augmentation strategy that enhances transformer-based models for extracting treatment events from unstructured oncology reports with minimal manual annotations.

Findings

01

Achieved F1-score of 0.872 in identifying treatment events.

02

Data augmentation improves model performance over non-augmented models.

03

Enables structuring of thousands of previously inaccessible treatment records.

Abstract

At the hospital, the dispersion of information regarding anti-cancer treatment makes it difficult to extract. We proposed a solution capable of identifying dates, drugs and their temporal relationship within free-text oncology reports with very few manual annotations. We used pattern recognition for dates, dictionaries for drugs and transformer language models for the relationship, combined with a data augmentation strategy. Our models achieved good prediction F1-scores, reaching 0.872. The performance of models with data augmentation outperforms those of models without. By inferring such models, we can now identify and structure thousands of previously unavailable treatment events to better apprehend solutions and patient response.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.