Robust LLM-based Column Type Annotation via Prompt Augmentation with LoRA Tuning
Hanze Meng, Jianhao Cao, Rachel Pottinger

TL;DR
This paper introduces a parameter-efficient LoRA-based prompt augmentation method for column type annotation, improving robustness and accuracy across datasets and prompt variations without extensive re-training.
Contribution
It proposes a novel, efficient fine-tuning framework using LoRA and prompt augmentation to enhance robustness and performance in column type annotation tasks.
Findings
Achieves higher weighted F1 scores than single-prompt fine-tuning.
Maintains stable performance across diverse prompt patterns.
Reduces computational costs compared to full model fine-tuning.
Abstract
Column Type Annotation (CTA) is a fundamental step towards enabling schema alignment and semantic understanding of tabular data. Existing encoder-only language models achieve high accuracy when fine-tuned on labeled columns, but their applicability is limited to in-domain settings, as distribution shifts in tables or label spaces require costly re-training from scratch. Recent work has explored prompting generative large language models (LLMs) by framing CTA as a multiple-choice task, but these approaches face two key challenges: (1) model performance is highly sensitive to subtle changes in prompt wording and structure, and (2) annotation F1 scores remain modest. A natural extension is to fine-tune large language models. However, fully fine-tuning these models incurs prohibitive computational costs due to their scale, and the sensitivity to prompts is not eliminated. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
