Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

Fengyu Li; Junhao Zhu; Kaishi Song; Lu Chen; Zhongming Yao; Tianyi Li; Christian S. Jensen

arXiv:2602.22721·cs.DB·April 2, 2026

Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

Fengyu Li, Junhao Zhu, Kaishi Song, Lu Chen, Zhongming Yao, Tianyi Li, Christian S. Jensen

PDF

TL;DR

This paper introduces Operation-R1, a framework that trains lightweight LLMs to generate data-preparation pipelines for table question answering in a single inference step, reducing latency and cost.

Contribution

It presents a novel reinforcement learning approach with verifiable rewards to train LLMs for pipeline generation, improving efficiency and robustness over multi-step methods.

Findings

01

Achieves 8.83 and 4.44 percentage point accuracy improvements over baselines.

02

Reduces monetary cost by 2.2 times.

03

Compresses table data by 79%.

Abstract

Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.