A Hassle-Free Machine Learning Method for Cohort Selection of Clinical Trials
Liu Man

TL;DR
This paper introduces a simple, supervised machine learning approach for clinical trial cohort selection that leverages NER-based keyword features and FastText embeddings, avoiding manual clinical knowledge.
Contribution
It presents a novel ensemble system combining NER-based keyword features with FastText embeddings for effective clinical trial cohort classification.
Findings
Effective and fast method for clinical trial cohort selection
Does not require manual clinical knowledge or complex feature engineering
Achieves competitive performance with simple implementation
Abstract
Traditional text classification techniques in clinical domain have heavily relied on the manually extracted textual cues. This paper proposes a generally supervised machine learning method that is equally hassle-free and does not use clinical knowledge. The employed methods were simple to implement, fast to run and yet effective. This paper proposes a novel named entity recognition (NER) based an ensemble system capable of learning the keyword features in the document. Instead of merely considering the whole sentence/paragraph for analysis, the NER based keyword features can stress the important clinic relevant phases more. In addition, to capture the semantic information in the documents, the FastText features originating from the document level FastText classification results are exploited.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Machine Learning and Algorithms
MethodsfastText
