Automated Extraction of Number of Subjects in Randomised Controlled   Trials

Abeed Sarker

arXiv:1606.07137·cs.AI·June 24, 2016

Automated Extraction of Number of Subjects in Randomised Controlled Trials

Abeed Sarker

PDF

Open Access

TL;DR

This paper introduces a rule-based and machine learning approach to automatically extract the number of subjects from RCT abstracts, achieving high accuracy with minimal training data.

Contribution

It combines rule-based extraction with supervised SVM classification to accurately identify study sizes in medical abstracts using limited annotated data.

Findings

01

Achieved 88% accuracy with only 201 training examples.

02

Effective for aiding medical text summarization and question answering.

03

Combines rule-based and machine learning techniques for information extraction.

Abstract

We present a simple approach for automatically extracting the number of subjects involved in randomised controlled trials (RCT). Our approach first applies a set of rule-based techniques to extract candidate study sizes from the abstracts of the articles. Supervised classification is then performed over the candidates with support vector machines, using a small set of lexical, structural, and contextual features. With only a small annotated training set of 201 RCTs, we obtained an accuracy of 88\%. We believe that this system will aid complex medical text processing tasks such as summarisation and question answering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques