A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications
Jerven Bolleman, Vincent Emonet, Adrian Altenhoff, Amos Bairoch, Marie-Claude Blatter, Alan Bridge, Severine Duvaud, Elisabeth Gasteiger, Dmitry Kuznetsov, Sebastien Moretti, Pierre-Andre Michel, Anne Morgat, Marco Pagni, Nicole Redaschi, Monique Zahn-Zabal

TL;DR
This paper presents a large, standardized collection of natural language questions and SPARQL queries over bioinformatics knowledge graphs, along with tools for visualization and query editing, to facilitate data retrieval and machine learning.
Contribution
It introduces a comprehensive dataset of over 1000 question-query pairs, a unified representation methodology, and open-source applications for knowledge graph querying in bioinformatics.
Findings
Collection includes 1000+ question-query pairs, with 65 federated queries.
Proposes a standardized, minimal-metadata representation methodology.
Provides reusable tools like query visualizations and editors.
Abstract
Background. In the last decades, several life science resources have structured data using the same framework and made these accessible using the same query language to facilitate interoperability. Knowledge graphs have seen increased adoption in bioinformatics due to their advantages for representing data in a generic graph format. For example, yummydata.org catalogs more than 60 knowledge graphs accessible through SPARQL, a technical query language. Although SPARQL allows powerful, expressive queries, even across physically distributed knowledge graphs, formulating such queries is a challenge for most users. Therefore, to guide users in retrieving the relevant data, many of these resources provide representative examples. These examples can also be an important source of information for machine learning, if a sufficiently large number of examples are provided and published in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Advanced Graph Neural Networks · Biomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training
