RACOON: An LLM-based Framework for Retrieval-Augmented Column Type   Annotation with a Knowledge Graph

Lindsey Linxi Wei; Guorui Xiao; Magdalena Balazinska

arXiv:2409.14556·cs.DB·November 4, 2024

RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph

Lindsey Linxi Wei, Guorui Xiao, Magdalena Balazinska

PDF

Open Access

TL;DR

RACOON enhances Large Language Model-based Column Type Annotation by integrating a Knowledge Graph to provide richer context, resulting in significant performance improvements in labeling accuracy.

Contribution

This paper introduces RACOON, a novel framework that combines parametric and non-parametric knowledge from a Knowledge Graph to improve LLM-based CTA.

Findings

01

Achieves up to 0.21 micro F-1 improvement over vanilla LLM inference.

02

Demonstrates the effectiveness of KG-augmented context in CTA tasks.

03

Validates the approach through extensive experiments.

Abstract

As an important component of data exploration and integration, Column Type Annotation (CTA) aims to label columns of a table with one or more semantic types. With the recent development of Large Language Models (LLMs), researchers have started to explore the possibility of using LLMs for CTA, leveraging their strong zero-shot capabilities. In this paper, we build on this promising work and improve on LLM-based methods for CTA by showing how to use a Knowledge Graph (KG) to augment the context information provided to the LLM. Our approach, called RACOON, combines both pre-trained parametric and non-parametric knowledge during generation to improve LLMs' performance on CTA. Our experiments show that RACOON achieves up to a 0.21 micro F-1 improvement compared against vanilla LLM inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Advanced Computational Techniques and Applications