PiLLar: Matching for Pivot Table Schema via LLM-guided Monte-Carlo Tree Search
Yunjun Gao, Chuangyu Ouyang, Congcong Ge, Yifan Zhu

TL;DR
PiLLar is a novel framework that uses LLM-guided Monte-Carlo Tree Search for accurate, privacy-preserving schema matching between pivot tables and relational tables in data lakes.
Contribution
It introduces a training-free, domain-adaptive schema matching method leveraging LLMs and provides a new benchmark dataset for evaluation.
Findings
PiLLar achieves 87.94% average accuracy on the PTbench benchmark.
The method operates with minimal annotated data and ensures asymptotic convergence.
Extensive experiments validate PiLLar's superiority over existing approaches.
Abstract
Pivot tables are ubiquitous in data lakes of modern data ecosystems, making accurate schema matching over pivot tables a key prerequisite for data integration. In this paper, we focus on matching for pivot table schema, which is a novel joint schema-value matching task. It aims to align schemas between pivot tables and standard relational tables, where a correct match must be semantically consistent at the schema level and compatible at the value level. However, due to the inherent data sensitivity of this task, the prevalence of anonymized data in practice poses significant challenges to its matching accuracy and generalization capability. To tackle these challenges, we propose PiLLar, the first matching for pivot table schema framework. We first formulate PiLLar as an LLM-driven search paradigm that operates with minimal annotated privacy-compliant data, thereby achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
