Domain-Relevant Embeddings for Medical Question Similarity
Clara McCreery, Namit Katariya, Anitha Kannan, Manish Chablani, Xavier, Amatriain

TL;DR
This paper introduces a semi-supervised neural network approach pre-trained on medical question-answer pairs, significantly improving the accuracy of medical question similarity detection over previous methods, especially with limited training data.
Contribution
The study demonstrates that pre-training on medical question-answer pairs enhances the performance of neural models in identifying similar medical questions, outperforming other pre-training tasks.
Findings
Achieved 82.6% accuracy with full training data.
Achieved 80.0% accuracy with smaller training set.
Outperformed other pre-training tasks below 78.7% accuracy.
Abstract
The rate at which medical questions are asked online far exceeds the capacity of qualified people to answer them, and many of these questions are not unique. Identifying same-question pairs could enable questions to be answered more effectively. While many research efforts have focused on the problem of general question similarity for non-medical applications, these approaches do not generalize well to the medical domain, where medical expertise is often required to determine semantic similarity. In this paper, we show how a semi-supervised approach of pre-training a neural network on medical question-answer pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pre-training tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
