KnowledgeHub: An end-to-end Tool for Assisted Scientific Discovery
Shinnosuke Tanaka, James Barry, Vishnudev Kuruvanthodi, Movina Moses,, Maxwell J. Giammona, Nathan Herr, Mohab Elkaref, Geeth De Mel

TL;DR
KnowledgeHub is an integrated tool that facilitates scientific discovery through literature ingestion, ontology-based annotation, information extraction, knowledge graph construction, and LLM-powered question answering and summarization.
Contribution
It introduces a comprehensive pipeline combining annotation, IE, and LLM integration for scientific literature analysis, which is novel in its end-to-end approach.
Findings
Supports PDF ingestion and structured data extraction
Enables ontology-based annotation and training of NER and RC models
Provides integrated QA and summarization grounded in document data
Abstract
This paper describes the KnowledgeHub tool, a scientific literature Information Extraction (IE) and Question Answering (QA) pipeline. This is achieved by supporting the ingestion of PDF documents that are converted to text and structured representations. An ontology can then be constructed where a user defines the types of entities and relationships they want to capture. A browser-based annotation tool enables annotating the contents of the PDF documents according to the ontology. Named Entity Recognition (NER) and Relation Classification (RC) models can be trained on the resulting annotations and can be used to annotate the unannotated portion of the documents. A knowledge graph is constructed from these entity and relation triples which can be queried to obtain insights from the data. Furthermore, we integrate a suite of Large Language Models (LLMs) that can be used for QA and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsOntology
