SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark
Sithursan Sivasubramaniam, Cedric Osei-Akoto, Yi Zhang, Kurt, Stockinger, Jonathan Fuerst

TL;DR
This paper introduces SM3-Text-to-Query, a comprehensive synthetic benchmark for evaluating medical Text-to-Query systems across multiple database models and query languages, highlighting the impact of database architecture on system performance.
Contribution
It presents the first multi-model medical Text-to-Query benchmark with 10K question/query pairs across relational, document, and graph databases, enabling systematic evaluation of LLM-based approaches.
Findings
Evaluation reveals trade-offs between database models and query languages.
In-context learning approaches vary in effectiveness across models.
Benchmark is extendable to additional languages and real datasets.
Abstract
Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far. In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy -- a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training · Ontology
