# Aligning Biomedical Metadata with Ontologies Using Clustering and   Embeddings

**Authors:** Rafael S. Gon\c{c}alves, Maulik R. Kamdar, and Mark A. Musen

arXiv: 1903.08206 · 2020-12-17

## TL;DR

This paper introduces a clustering and embedding-based method to normalize and align heterogeneous biomedical metadata field names with ontology terms, improving consistency and queryability of scientific experiment metadata.

## Contribution

The paper presents a novel approach combining clustering and embeddings to better align metadata fields with ontologies, outperforming existing tools like NCBO Annotator.

## Key findings

- Our method produces more accurate alignments.
- It yields significantly better coverage of ontology terms.
- The approach enhances metadata normalization for biomedical data.

## Abstract

The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering scientific data, it is crucial that they be represented in a uniform way that can be queried effectively. One step toward uniformly-represented metadata is to normalize the multiple, distinct field names used in metadata (e.g., lat lon, lat and long) to describe the same type of value. To that end, we present a new method based on clustering and embeddings (i.e., vector representations of words) to align metadata field names with ontology terms. We apply our method to biomedical metadata by generating embeddings for terms in biomedical ontologies from the BioPortal repository. We carried out a comparative study between our method and the NCBO Annotator, which revealed that our method yields more and substantially better alignments between metadata and ontology terms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.08206/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1903.08206/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1903.08206/full.md

---
Source: https://tomesphere.com/paper/1903.08206