CompoST: A Benchmark for Analyzing the Ability of LLMs To Compositionally Interpret Questions in a QALD Setting

David Maria Schmidt; Raoul Schubert; Philipp Cimiano

arXiv:2507.21257·cs.AI·October 31, 2025

CompoST: A Benchmark for Analyzing the Ability of LLMs To Compositionally Interpret Questions in a QALD Setting

David Maria Schmidt, Raoul Schubert, Philipp Cimiano

PDF

TL;DR

This paper introduces CompoST, a benchmark to evaluate how well large language models can interpret complex, compositional questions into SPARQL queries, revealing significant limitations in their systematic understanding.

Contribution

The paper presents a controlled benchmark with datasets of varying complexity to assess LLMs' compositional question interpretation abilities, highlighting their struggles in systematic understanding.

Findings

01

Performance drops significantly with increased question complexity

02

Even with full input information, F1 scores remain low (~0.57)

03

LLMs have limited ability to interpret complex questions compositionally

Abstract

Language interpretation is a compositional process, in which the meaning of more complex linguistic structures is inferred from the meaning of their parts. Large language models possess remarkable language interpretation capabilities and have been successfully applied to interpret questions by mapping them to SPARQL queries. An open question is how systematic this interpretation process is. Toward this question, in this paper, we propose a benchmark for investigating to what extent the abilities of LLMs to interpret questions are actually compositional. For this, we generate three datasets of varying difficulty based on graph patterns in DBpedia, relying on Lemon lexica for verbalization. Our datasets are created in a very controlled fashion in order to test the ability of LLMs to interpret structurally complex questions, given that they have seen the atomic building blocks. This allows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.