Column Vocabulary Association (CVA): semantic interpretation of dataless tables
Margherita Martorana, Xueli Pan, Benno Kruit, Tobias Kuhn, Jacco van, Ossenbruggen

TL;DR
This paper explores semantic annotation of table headers using only metadata, evaluating various large language models and traditional methods in a zero-shot setting to understand their effectiveness and limitations.
Contribution
It introduces the Column Vocabulary Association (CVA) task for metadata-only semantic annotation and evaluates multiple LLMs and traditional approaches in this context.
Findings
LLMs perform well at temperatures below 1.0, achieving high accuracy.
Traditional methods outperform LLMs when data and glossary are related.
Data nature significantly influences CVA task performance.
Abstract
Traditional Semantic Table Interpretation (STI) methods rely primarily on the underlying table data to create semantic annotations. This year's SemTab challenge introduced the ``Metadata to KG'' track, which focuses on performing STI by using only metadata information, without access to the underlying data. In response to this new challenge, we introduce a new term: Column Vocabulary Association (CVA). This term refers to the task of semantic annotation of column headers solely based on metadata information. In this study, we evaluate the performance of various methods in executing the CVA task, including a Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) approach, as well as a more traditional similarity approach with SemanticBERT. Our methodology uses a zero-shot setting, with no pretraining or examples passed to the Large Language Models (LLMs), as we aim to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Quality and Management · Data Mining Algorithms and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Linear Warmup With Cosine Annealing · Adam · WordPiece
