TL;DR
StructSense is a versatile, open-source framework for extracting structured information from scientific literature, combining symbolic knowledge, self-evaluation, and human validation to perform well across diverse tasks.
Contribution
It introduces a modular, task-agnostic system that integrates ontology-guided extraction, self-refinement, and human-in-the-loop validation for domain-aware information extraction.
Findings
Achieved 91-100% accuracy in schema-based extraction of assessment instruments.
Attained 86-93% overall accuracy in metadata and resource extraction from scientific papers.
Reached 58-75% label accuracy in neuroscience NER tasks across 8,882 entities.
Abstract
Extracting structured information from scientific literature is critical for accelerating discovery, yet Large Language Models (LLMs) often struggle in specialized domains that require expert knowledge and generalize poorly across tasks. We introduce \textsc{StructSense}, a modular, task-agnostic, open-source framework that integrates ontology-guided symbolic knowledge, agentic self-evaluative refinement, and human-in-the-loop validation for robust domain-aware extraction. We evaluate \textsc{StructSense} on three tasks of increasing semantic complexity: schema-based extraction of assessment instruments (91--100\% accuracy), metadata and resource extraction from scientific papers (86--93\% overall), and named entity recognition (NER) from neuroscience literature (58--75\% label accuracy across 8,882 entities). On two biomedical NER benchmarks (NCBI Disease and S800 Species), the system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Time Series Analysis and Forecasting · Data Quality and Management
