ctELM: Decoding and Manipulating Embeddings of Clinical Trials with Embedding Language Models
Brian Ondov, Chia-Hsuan Chang, Yujia Zhou, Mauro Giuffr\`e, Hua Xu

TL;DR
This paper introduces ctELM, an embedding language model tailored for clinical trials, enabling interpretation, comparison, and generation of trial data from embeddings, enhancing transparency and utility in biomedical NLP.
Contribution
The work develops a domain-agnostic ELM framework for clinical trial embeddings, including training tasks, an expert-validated dataset, and demonstrates its ability to interpret and generate clinical trial information.
Findings
ctELM accurately describes unseen clinical trials from embeddings.
Generated trial abstracts respond to concept vectors like age and sex.
The approach improves interpretability and generative capabilities of embeddings in biomedical NLP.
Abstract
Text embeddings have become an essential part of a variety of language applications. However, methods for interpreting, exploring and reversing embedding spaces are limited, reducing transparency and precluding potentially valuable generative use cases. In this work, we align Large Language Models to embeddings of clinical trials using the recently reported Embedding Language Model (ELM) method. We develop an open-source, domain-agnostic ELM architecture and training framework, design training tasks for clinical trials, and introduce an expert-validated synthetic dataset. We then train a series of ELMs exploring the impact of tasks and training regimes. Our final model, ctELM, can accurately describe and compare unseen clinical trials from embeddings alone and produce plausible clinical trials from novel vectors. We further show that generated trial abstracts are responsive to moving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Advanced Graph Neural Networks
