Assembly and reasoning over semantic mappings at scale for biomedical data integration
Charles Tapley Hoyt, Klas Karis, Benjamin M Gyori

TL;DR
This paper introduces SeMRA, a tool that helps integrate biomedical data by connecting identifiers from different resources using a graph-based approach.
Contribution
SeMRA introduces a scalable system for assembling and reasoning over semantic mappings across biomedical identifiers.
Findings
SeMRA integrates 43.4 million mappings from 127 sources covering 445 ontologies and databases.
The system enables connecting identifier spaces previously unmapped through inferred relationships.
Benchmarks demonstrate successful integration of disease and cell type resources.
Abstract
Hundreds of resources assign identifiers to biomedical concepts including genes, small molecules, biological processes, diseases, and cell types. Often, these resources overlap by assigning identifiers to the same or related concepts. This creates a data interoperability bottleneck, as integrating data sets and knowledge bases that use identifiers for the same concepts from different resources requires such identifiers to be mapped to each other. However, available mappings are incomplete and fragmented across individual resources, motivating their large-scale integration. We developed SeMRA, a software tool that integrates mappings from multiple sources into a graph data structure. Using graph algorithms, it infers missing mappings implied by available ones while keeping track of provenance and confidence. This allows connecting identifier spaces between which direct mapping was…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
