ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery

Shahar Levy; Eliya Habba; Reshef Mintz; Barak Raveh; Renana Keydar; Gabriel Stanovsky

arXiv:2604.09237·cs.CL·April 13, 2026

ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery

Shahar Levy, Eliya Habba, Reshef Mintz, Barak Raveh, Renana Keydar, Gabriel Stanovsky

PDF

2 Repos

TL;DR

ScheMatiQ is an interactive system that uses large language models to convert research questions into structured data, facilitating analysis across disciplines with minimal manual effort.

Contribution

It introduces a novel LLM-based approach for automatic schema generation and data grounding, supported by a web interface for expert-guided refinement.

Findings

01

Supports real-world analysis in law and biology

02

Produces schemas and databases with minimal manual labeling

03

Open source release with web interface and resources

Abstract

Many disciplines pose natural-language research questions over large document collections whose answers typically require structured evidence, traditionally obtained by manually designing an annotation schema and exhaustively labeling the corpus, a slow and error-prone process. We introduce ScheMatiQ, which leverages calls to a backbone LLM to take a question and a corpus to produce a schema and a grounded database, with a web interface that lets steer and revise the extraction. In collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology. We release ScheMatiQ as open source with a public web interface, and invite experts across disciplines to use it with their own data. All resources, including the website, source code, and demonstration video, are available at: www.ScheMatiQ-ai.com

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.