Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations
Sihang Zeng, Zheng Yuan, Sheng Yu

TL;DR
This paper introduces CODER++, a contrastive learning method that enhances biomedical term embeddings by incorporating dynamic hard positive and negative samples, leading to improved clustering of biomedical concepts.
Contribution
The paper proposes a novel sampling strategy in contrastive learning to produce fine-grained biomedical term representations, addressing limitations of existing embeddings.
Findings
CODER++ improves biomedical term clustering accuracy.
Enhanced embeddings better distinguish minor textual differences.
Application to BIOS knowledge graph demonstrates effectiveness.
Abstract
Term clustering is important in biomedical knowledge graph construction. Using similarities between terms embedding is helpful for term clustering. State-of-the-art term embeddings leverage pretrained language models to encode terms, and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning. These embeddings provide close embeddings for terms belonging to the same concept. However, from our probing experiments, these embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering. To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples during contrastive learning to learn fine-grained representations which result in better biomedical term clustering. We name our proposed method as CODER++, and it has been applied in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · linguistics and terminology studies · Natural Language Processing Techniques
MethodsContrastive Learning
