Automated Extraction of Number of Subjects in Randomised Controlled Trials
Abeed Sarker

TL;DR
This paper introduces a rule-based and machine learning approach to automatically extract the number of subjects from RCT abstracts, achieving high accuracy with minimal training data.
Contribution
It combines rule-based extraction with supervised SVM classification to accurately identify study sizes in medical abstracts using limited annotated data.
Findings
Achieved 88% accuracy with only 201 training examples.
Effective for aiding medical text summarization and question answering.
Combines rule-based and machine learning techniques for information extraction.
Abstract
We present a simple approach for automatically extracting the number of subjects involved in randomised controlled trials (RCT). Our approach first applies a set of rule-based techniques to extract candidate study sizes from the abstracts of the articles. Supervised classification is then performed over the candidates with support vector machines, using a small set of lexical, structural, and contextual features. With only a small annotated training set of 201 RCTs, we obtained an accuracy of 88\%. We believe that this system will aid complex medical text processing tasks such as summarisation and question answering.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques
