The CALBC RDF Triple Store: retrieval over large literature content
Samuel Croset, Christoph Grabm\"uller, Chen Li, Silvestras, Kavaliauskas, Dietrich Rebholz-Schuhmann

TL;DR
This paper presents the CALBC RDF Triple Store, a large-scale, integrated database of biomedical literature and resources that enables complex querying across multiple bioinformatics datasets for research insights.
Contribution
The paper introduces a comprehensive RDF Triple Store combining annotated biomedical literature with bioinformatics resources, facilitating integrated querying and analysis.
Findings
Contains over 4.5 million triples from literature annotations
Integrates data from GeneAtlas, UniProtKb, and LexEBI
Enables querying of literature and bioinformatics data simultaneously
Abstract
Integration of the scientific literature into a biomedical research infrastructure requires the processing of the literature, identification of the contained named entities (NEs) and concepts, and to represent the content in a standardised way. The CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions (Silver Standard Corpus, SSC). The four semantic groups were chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). The content of the SSC has been fully integrated into RDF Triple Store (4,568,678 triples) and has been aligned with content from the GeneAtlas (182,840 triples), UniProtKb (12,552,239 triples for human) and the lexical resource LexEBI (BioLexicon). RDF Triple Store enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
