Bio-SODA: Enabling Natural Language Question Answering over Knowledge   Graphs without Training Data

Ana Claudia Sima; Tarcisio Mendes de Farias; Maria Anisimova,; Christophe Dessimoz; Marc Robinson-Rechavi; Erich Zbinden; and Kurt; Stockinger

arXiv:2104.13744·cs.DB·June 15, 2021

Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data

Ana Claudia Sima, Tarcisio Mendes de Farias, Maria Anisimova,, Christophe Dessimoz, Marc Robinson-Rechavi, Erich Zbinden, and Kurt, Stockinger

PDF

1 Repo

TL;DR

Bio-SODA is a question answering system over scientific knowledge graphs that does not require training data, using a graph-based approach and node centrality for ranking SPARQL queries, outperforming existing systems.

Contribution

It introduces Bio-SODA, a novel training-free NLP engine for scientific knowledge graphs that translates questions into SPARQL using graph-based methods and node centrality.

Findings

01

Bio-SODA outperforms existing KGQA systems by at least 20% F1-score.

02

It effectively handles complex scientific datasets without training data.

03

Experimental results include success on the bioinformatics QALD challenge.

Abstract

The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training datasets to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex scientific datasets where no prior training data is available. In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anazhaw/Bio-SODA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.