Text2Node: a Cross-Domain System for Mapping Arbitrary Phrases to a Taxonomy
Rohollah Soltani, Alexandre Tomberg

TL;DR
Text2Node is a scalable, robust system that maps medical phrases to large taxonomies like SNOMED CT, generalizing from limited data and handling unseen concepts to improve healthcare data interoperability.
Contribution
The paper introduces a novel cross-domain mapping system that leverages embedding techniques and machine learning to connect medical phrases with taxonomies, overcoming scalability and generalization limitations of prior methods.
Findings
Successfully mapped ICD-9-CM diagnosis phrases to SNOMED CT
Demonstrated zero-shot training with comparable accuracy
Proved robustness and generalization in large-scale healthcare datasets
Abstract
Electronic health record (EHR) systems are used extensively throughout the healthcare domain. However, data interchangeability between EHR systems is limited due to the use of different coding standards across systems. Existing methods of mapping coding standards based on manual human experts mapping, dictionary mapping, symbolic NLP and classification are unscalable and cannot accommodate large scale EHR datasets. In this work, we present Text2Node, a cross-domain mapping system capable of mapping medical phrases to concepts in a large taxonomy (such as SNOMED CT). The system is designed to generalize from a limited set of training samples and map phrases to elements of the taxonomy that are not covered by training data. As a result, our system is scalable, robust to wording variants between coding systems and can output highly relevant concepts when no exact concept exists in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
Methodsnode2vec · fastText · GloVe Embeddings
