ZTab: Domain-based Zero-shot Annotation for Table Columns
Ehsan Hoseinzade, Ke Wang

TL;DR
ZTab introduces a domain-based zero-shot framework for automatic semantic column type detection in tables, eliminating the need for labeled data and improving performance across diverse domains.
Contribution
It proposes a novel domain-based zero-shot approach that fine-tunes an LLM on pseudo-tables generated from domain configurations, enhancing zero-shot accuracy without user-specific training data.
Findings
Effective in multiple domain configurations
Reduces privacy risks compared to closed-source LLMs
Achieves competitive zero-shot performance
Abstract
This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user-provided labeled training data, making it ideal for scenarios where data collection is costly or restricted due to privacy concerns. However, existing zero-shot models suffer from poor performance when the number of semantic column types is large, limited understanding of tabular structure, and privacy risks arising from dependence on high-performance closed-source LLMs. We introduce ZTab, a domain-based zero-shot framework that addresses both performance and zero-shot requirements. Given a domain configuration consisting of a set of predefined semantic types and sample table schemas, ZTab generates pseudo-tables for the sample schemas and fine-tunes an annotation LLM on them. ZTab is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data Visualization and Analytics
