STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

Tek Raj Chhetri; Yibei Chen; Puja Trivedi; Dorota Jarecka; Saif Haobsh; Patrick Ray; Lydia Ng; Satrajit S. Ghosh

arXiv:2507.03674·cs.CL·May 22, 2026

STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking

Tek Raj Chhetri, Yibei Chen, Puja Trivedi, Dorota Jarecka, Saif Haobsh, Patrick Ray, Lydia Ng, Satrajit S. Ghosh

PDF

1 Repo

TL;DR

StructSense is a versatile, open-source framework for extracting structured information from scientific literature, combining symbolic knowledge, self-evaluation, and human validation to perform well across diverse tasks.

Contribution

It introduces a modular, task-agnostic system that integrates ontology-guided extraction, self-refinement, and human-in-the-loop validation for domain-aware information extraction.

Findings

01

Achieved 91-100% accuracy in schema-based extraction of assessment instruments.

02

Attained 86-93% overall accuracy in metadata and resource extraction from scientific papers.

03

Reached 58-75% label accuracy in neuroscience NER tasks across 8,882 entities.

Abstract

Extracting structured information from scientific literature is critical for accelerating discovery, yet Large Language Models (LLMs) often struggle in specialized domains that require expert knowledge and generalize poorly across tasks. We introduce \textsc{StructSense}, a modular, task-agnostic, open-source framework that integrates ontology-guided symbolic knowledge, agentic self-evaluative refinement, and human-in-the-loop validation for robust domain-aware extraction. We evaluate \textsc{StructSense} on three tasks of increasing semantic complexity: schema-based extraction of assessment instruments (91--100\% accuracy), metadata and resource extraction from scientific papers (86--93\% overall), and named entity recognition (NER) from neuroscience literature (58--75\% label accuracy across 8,882 entities). On two biomedical NER benchmarks (NCBI Disease and S800 Species), the system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sensein/structsense
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Time Series Analysis and Forecasting · Data Quality and Management