AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks
Xinyi He, Mengyu Zhou, Mingjie Zhou, Jialiang Xu, Xiao Lv, Tianle Li,, Yijia Shao, Shi Han, Zejian Yuan, Dongmei Zhang

TL;DR
AnaMeta is a large dataset of 467,000 tables with labels for field metadata, enabling improved understanding of tabular data for analysis tasks through a new multi-encoder framework and interfaces.
Contribution
The paper introduces AnaMeta, a comprehensive dataset and a novel multi-encoder framework KDF for enhanced field metadata understanding in tabular data analysis.
Findings
KDF outperforms baseline models in metadata inference tasks.
The dataset enables effective training and evaluation of metadata understanding models.
Proposed interfaces facilitate integration of metadata into downstream analysis.
Abstract
Tabular data analysis is performed every day across various domains. It requires an accurate understanding of field semantics to correctly operate on table fields and find common patterns in daily analysis. In this paper, we introduce the AnaMeta dataset, a collection of 467k tables with derived supervision labels for four types of commonly used field metadata: measure/dimension dichotomy, common field roles, semantic field type, and default aggregation function. We evaluate a wide range of models for inferring metadata as the benchmark. We also propose a multi-encoder framework, called KDF, which improves the metadata understanding capability of tabular models by incorporating distribution and knowledge information. Furthermore, we propose four interfaces for incorporating field metadata into downstream analysis tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Advanced Database Systems and Queries
