Fake it till you predict it: data augmentation strategies to detect initiation and termination of oncology treatment
Valentin Pohyer (HEGP AP-HP), Elizabeth Fabre, St\'ephane Oudard,, Laure Fournier, Bastien Rance

TL;DR
This paper introduces a method combining pattern recognition, dictionaries, transformer models, and data augmentation to extract treatment initiation and termination details from free-text oncology reports, improving information retrieval in clinical settings.
Contribution
The study presents a novel data augmentation strategy that enhances transformer-based models for extracting treatment events from unstructured oncology reports with minimal manual annotations.
Findings
Achieved F1-score of 0.872 in identifying treatment events.
Data augmentation improves model performance over non-augmented models.
Enables structuring of thousands of previously inaccessible treatment records.
Abstract
At the hospital, the dispersion of information regarding anti-cancer treatment makes it difficult to extract. We proposed a solution capable of identifying dates, drugs and their temporal relationship within free-text oncology reports with very few manual annotations. We used pattern recognition for dates, dictionaries for drugs and transformer language models for the relationship, combined with a data augmentation strategy. Our models achieved good prediction F1-scores, reaching 0.872. The performance of models with data augmentation outperforms those of models without. By inferring such models, we can now identify and structure thousands of previously unavailable treatment events to better apprehend solutions and patient response.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
