Classifiers of Data Sharing Statements in Clinical Trial Records
Saber Jelodari Mamaghani, Cosima Strantz, Dennis Toddenroth

TL;DR
This study evaluates the effectiveness of domain-specific pre-trained language models in classifying data sharing statements in clinical trial records, highlighting their potential to improve automatic identification of available individual participant data.
Contribution
It demonstrates that classifiers trained on manual annotations outperform those predicting original categories, revealing the richness of textual data sharing statements beyond predefined categories.
Findings
Classifiers based on manual annotations outperform original category classifiers.
Domain-specific language models effectively interpret data sharing statements.
Textual descriptions contain more information than existing categorical labels.
Abstract
Digital individual participant data (IPD) from clinical trials are increasingly distributed for potential scientific reuse. The identification of available IPD, however, requires interpretations of textual data-sharing statements (DSS) in large databases. Recent advancements in computational linguistics include pre-trained language models that promise to simplify the implementation of effective classifiers based on textual inputs. In a subset of 5,000 textual DSS from ClinicalTrials.gov, we evaluate how well classifiers based on domain-specific pre-trained language models reproduce original availability categories as well as manually annotated labels. Typical metrics indicate that classifiers that predicted manual annotations outperformed those that learned to output the original availability categories. This suggests that the textual DSS descriptions contain applicable information that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
