PIPE-RDF: An LLM-Assisted Pipeline for Enterprise RDF Benchmarking

Suraj Ranganath

arXiv:2602.18497·cs.DB·February 24, 2026

PIPE-RDF: An LLM-Assisted Pipeline for Enterprise RDF Benchmarking

Suraj Ranganath

PDF

Open Access

TL;DR

PIPE-RDF is a pipeline that creates realistic, schema-specific RDF benchmarks for enterprise knowledge graphs, enabling better evaluation of natural language interfaces and question-answering systems.

Contribution

It introduces a novel pipeline that generates balanced, schema-specific RDF benchmarks with high validity, tailored for enterprise knowledge graph evaluation.

Findings

01

Achieved 100% validity after repair in benchmark generation

02

Generated 450 question-SPARQL pairs across nine categories

03

Demonstrated pipeline's effectiveness on a large enterprise RDF dataset

Abstract

Enterprises rely on RDF knowledge graphs and SPARQL to expose operational data through natural language interfaces, yet public KGQA benchmarks do not reflect proprietary schemas, prefixes, or query distributions. We present PIPE-RDF, a three-phase pipeline that constructs schema-specific NL-SPARQL benchmarks using reverse querying, category-balanced template generation, retrieval-augmented prompting, deduplication, and execution-based validation with repair. We instantiate PIPE-RDF on a fixed-schema company-location slice (5,000 companies) derived from public RDF data and generate a balanced benchmark of 450 question-SPARQL pairs across nine categories. The pipeline achieves 100% parse and execution validity after repair, with pre-repair validity rates of 96.5%-100% across phases. We report entity diversity metrics, template coverage analysis, and cost breakdowns to support deployment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Data Quality and Management · Scientific Computing and Data Management