# Insights into Analogy Completion from the Biomedical Domain

**Authors:** Denis Newman-Griffis, Albert M Lai, Eric Fosler-Lussier

arXiv: 1706.02241 · 2017-06-08

## TL;DR

This paper analyzes the limitations of standard analogy completion methods in the biomedical domain, proposes modifications to address these issues, and introduces a new dataset, BMASS, to evaluate biomedical word embeddings.

## Contribution

It identifies key assumptions in analogy tasks that do not hold in biomedical data, proposes methodological improvements, and presents the BMASS dataset for better evaluation.

## Key findings

- Current biomedical embeddings struggle with semantic regularities.
- Allowing multiple answers improves analogy evaluation.
- BMASS dataset reveals challenges in biomedical word embeddings.

## Abstract

Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains. Through an analysis of analogies in the biomedical domain, we identify three assumptions: that of a Single Answer for any given analogy, that the pairs involved describe the Same Relationship, and that each pair is Informative with respect to the other. We propose modifying the standard methodology to relax these assumptions by allowing for multiple correct answers, reporting MAP and MRR in addition to accuracy, and using multiple example pairs. We further present BMASS, a novel dataset for evaluating linguistic regularities in biomedical embeddings, and demonstrate that the relationships described in the dataset pose significant semantic challenges to current word embedding methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.02241/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1706.02241/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1706.02241/full.md

---
Source: https://tomesphere.com/paper/1706.02241