# Identifying translational science through embeddings of controlled   vocabularies

**Authors:** Qing Ke

arXiv: 1812.10609 · 2020-07-13

## TL;DR

This paper introduces a novel vector-based method to quantify the degree of translational science in biomedical research papers, enabling a continuous spectrum from basic to clinical research to better evaluate and understand scientific translation.

## Contribution

It develops a translational axis using controlled vocabularies to assign a continuous 'appliedness' score to papers and terms, advancing beyond categorical classifications.

## Key findings

- High agreement with previous methods in measuring translational level
- Significant variation in scores within existing categories
- Citation patterns favor similar scores and tend to lead towards basic research

## Abstract

Objective: Translational science aims at "translating" basic scientific discoveries into clinical applications. The identification of translational science has practicality such as evaluating the effectiveness of investments made into large programs like the Clinical and Translational Science Awards. Despite several proposed methods that group publications---the primary unit of research output---into some categories, we still lack a quantitative way to place papers onto the full, continuous spectrum from basic research to clinical medicine. Methods: Here we learn vector-representations of controlled vocabularies assigned to MEDLINE papers to obtain a Translational Axis (TA) that points from basic science to clinical medicine. The projected position of a term on the TA, expressed by a continuous quantity, indicates the term's "appliedness." The position of a paper, determined by the average location over its terms, quantifies the degree of its "appliedness," which we term as "level score." Results: We validate our method by comparing with previous techniques, showing excellent agreement yet uncovering significant variations of scores of papers in previously defined categories. The measure allows us to characterize the standing of journals, disciplines, and the entire biomedical literature along the basic-applied spectrum. Analysis on large-scale citation network reveals two main findings. First, direct citations mainly occurred between papers with similar scores. Second, shortest paths are more likely ended up with a paper closer to the basic end of the spectrum, regardless of where the starting paper is on the spectrum. Conclusions: The proposed method provides a quantitative way to identify translational science.

---
Source: https://tomesphere.com/paper/1812.10609