An innovative solution for breast cancer textual big data analysis
Nicolas Thiebaut, Antoine Simoulin, Karl Neuberger, Issam Ibnouhsein,, Nicolas Bousquet, Nathalie Reix, S\'ebastien Moli\`ere, Carole Mathelin

TL;DR
This paper presents a novel language-agnostic NLP system that efficiently extracts and structures clinical information from breast cancer EHRs, enabling retrospective studies with minimal manual effort.
Contribution
The work introduces a custom, ontology-free NLP approach combining text mining and synonym detection for structured analysis of breast cancer clinical reports.
Findings
High extraction accuracy for key clinical concepts
No need for pre-existing ontologies or annotated corpora
Facilitates retrospective studies with reduced manual work
Abstract
The digitalization of stored information in hospitals now allows for the exploitation of medical data in text format, as electronic health records (EHRs), initially gathered for other purposes than epidemiology. Manual search and analysis operations on such data become tedious. In recent years, the use of natural language processing (NLP) tools was highlighted to automatize the extraction of information contained in EHRs, structure it and perform statistical analysis on this structured information. The main difficulties with the existing approaches is the requirement of synonyms or ontology dictionaries, that are mostly available in English only and do not include local or custom notations. In this work, a team composed of oncologists as domain experts and data scientists develop a custom NLP-based system to process and structure textual clinical reports of patients suffering from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Data Quality and Management · Topic Modeling
