STRuCT-LLM: Unifying Tabular and Graph Reasoning with Reinforcement Learning for Semantic Parsing
Josefa Lia Stoisser, Marc Boubnovski Martell, Lawrence Phillips, Casper Hansen, Julien Fauqueur

TL;DR
STRuCT-LLM introduces a unified reinforcement learning framework for large language models to perform structured reasoning over relational and graph data, enabling cross-formalism transfer and strong zero-shot generalization.
Contribution
It presents a novel joint training approach for Text-to-SQL and Text-to-Cypher tasks using RL and Chain-of-Thought supervision, with a topology-aware reward for graph parsing.
Findings
Achieves 13.5% improvement on Spider benchmark
Improves Text2Cypher performance by 73.1%
Demonstrates strong zero-shot generalization on downstream tasks
Abstract
We propose STRuCT-LLM, a unified framework for training large language models (LLMs) to perform structured reasoning over both relational and graph-structured data. Our approach jointly optimizes Text-to-SQL and Text-to-Cypher tasks using reinforcement learning (RL) combined with Chain-of-Thought (CoT) supervision. To support fine-grained optimization in graph-based parsing, we introduce a topology-aware reward function based on graph edit distance. Unlike prior work that treats relational and graph formalisms in isolation, STRuCT-LLM leverages shared abstractions between SQL and Cypher to induce cross-formalism transfer, enabling SQL training to improve Cypher performance and vice versa - even without shared schemas. Our largest model (QwQ-32B) achieves substantial relative improvements across tasks: on semantic parsing, Spider improves by 13.5\% and Text2Cypher by 73.1\%. The model…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper's methodology is interesting and the writing is clear. The main questions is the significance.
The main question for me is the significance of the result, and whether this approach actually moves the needle.
- The reward design is of novelty. It combines multi-signal rewards with a topology-aware structural reward for Cypher, going beyond binary execution checks. - It has Bidirectional transfer across formalisms, which reduces both logical and data-reference errors. - It is with credibility via analyses and ablations. Error type breakdown and ablation studies make improvements interpretable and trustworthy. - The CoTplus GRPO pipeline, reward mixing, and cross-formalism setup can be reused to new da
- The figures are overly concise. For example, Figure 1 shows a unified pipeline but does not explain how the same natural-language question is systematically mapped into SQL and Cypher. The figure also doesn’t describe where they diverge and converge, which makes the figure more difficult to understand. - RL uses a modest sample size (~3.5k per language); Cypher evaluation shares a lineage with training data, which is limited. - There is no dedicated hallucination metric. - The evaluation relie
1. Authors first focus on Cypher, a graph-abased semantic parsing domains with reinforcement learning. 2. Experiments are sound.
1. The whole paper is trivial, is just like implement GRPO algorithm in semantic parsing domains and run experiments. It seems that authors only refine reward function with other parts not different with GRPO-based papers in other domains. By the way, the are many GRPO-based papers for text-to-SQL training such as SQL-R1, Reasoning-SQL, Arctic-SQL, author didn't compare results with those relevant methods. From my perspective, the only difference is different reward function. Therefore, i think
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Graph Neural Networks · Topic Modeling
