ExtracTable: Human-in-the-Loop Transformation of Scientific Corpora into Structured Knowledge
Lena John, Ahmed Malek Ghanmi, Tim Wittenborg, S\"oren Auer, Oliver Karras

TL;DR
ExtracTable is a human-in-the-loop framework that combines large language models and user input to efficiently convert unstructured scientific literature into structured data for knowledge graphs, significantly reducing review time.
Contribution
It introduces a novel HITL workflow integrating LLMs with user validation for transforming scientific texts into structured knowledge representations.
Findings
High usability with SUS score of 84.17
Reduced literature review time from hours to minutes
Effective integration into knowledge graphs
Abstract
As the volume of scientific literature grows, efficient knowledge organization becomes increasingly challenging. Traditional approaches to structuring scientific content are time-consuming and require significant domain expertise, highlighting the need for tool support. We present ExtracTable, a Human-in-the-Loop (HITL) workflow and framework that assists researchers in transforming unstructured publications into structured representations. The workflow combines large language models (LLMs) with user-defined schemas and is designed for downstream integration into knowledge graphs (KGs). Developed and evaluated in the context of the Open Research Knowledge Graph (ORKG), ExtracTable automates key steps such as document preprocessing and data extraction while ensuring user oversight through validation. In an evaluation with ORKG community participants following the Quality Improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Graph Neural Networks · Topic Modeling
