KGLink: A column type annotation method that combines knowledge graph and pre-trained language model
Yubo Wang, Hao Xin, Lei Chen

TL;DR
KGLink is a novel method that combines knowledge graph data with pre-trained language models to improve semantic annotation of table columns, addressing issues of granularity and context missing.
Contribution
It introduces a hybrid approach that effectively integrates KG information with deep learning to enhance column annotation accuracy and scalability.
Findings
Outperforms existing methods on diverse tabular datasets.
Effectively addresses type granularity and context missing issues.
Demonstrates robustness across numeric and string columns.
Abstract
The semantic annotation of tabular data plays a crucial role in various downstream tasks. Previous research has proposed knowledge graph (KG)-based and deep learning-based methods, each with its inherent limitations. KG-based methods encounter difficulties annotating columns when there is no match for column cells in the KG. Moreover, KG-based methods can provide multiple predictions for one column, making it challenging to determine the semantic type with the most suitable granularity for the dataset. This type granularity issue limits their scalability. On the other hand, deep learning-based methods face challenges related to the valuable context missing issue. This occurs when the information within the table is insufficient for determining the correct column type. This paper presents KGLink, a method that combines WikiData KG information with a pre-trained deep learning language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
