Database Entity Recognition with Data Augmentation and Deep Learning

Zikun Fu; Chen Yang; Kourosh Davoudi; Ken Q. Pu

arXiv:2508.19372·cs.CL·August 28, 2025

Database Entity Recognition with Data Augmentation and Deep Learning

Zikun Fu, Chen Yang, Kourosh Davoudi, Ken Q. Pu

PDF

TL;DR

This paper introduces a new benchmark, data augmentation method, and T5-based model for improving database entity recognition in natural language queries, demonstrating significant performance gains over existing methods.

Contribution

The paper presents a novel data augmentation technique, a specialized T5-based entity recognition model, and a human-annotated benchmark for DB-ER in NLQ, advancing the state-of-the-art in this task.

Findings

01

Data augmentation improves precision and recall by over 10%.

02

Fine-tuning T5 backbone boosts metrics by 5-10%.

03

Our model outperforms two state-of-the-art NER taggers.

Abstract

This paper addresses the challenge of Database Entity Recognition (DB-ER) in Natural Language Queries (NLQ). We present several key contributions to advance this field: (1) a human-annotated benchmark for DB-ER task, derived from popular text-to-sql benchmarks, (2) a novel data augmentation procedure that leverages automatic annotation of NLQs based on the corresponding SQL queries which are available in popular text-to-SQL benchmarks, (3) a specialized language model based entity recognition model using T5 as a backbone and two down-stream DB-ER tasks: sequence tagging and token classification for fine-tuning of backend and performing DB-ER respectively. We compared our DB-ER tagger with two state-of-the-art NER taggers, and observed better performance in both precision and recall for our model. The ablation evaluation shows that data augmentation boosts precision and recall by over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.