DeepTagger: Knowledge Enhanced Named Entity Recognition for Web-Based Ads Queries
Simiao Zuo, Pengfei Tang, Xinyu Hu, Qiang Lou, Jian Jiao, Denis, Charles

TL;DR
DeepTagger is a novel knowledge-enhanced NER model tailored for web-based ad queries, addressing challenges of data scarcity and query characteristics by leveraging unlabeled data, search results, and large language models.
Contribution
The paper introduces DeepTagger, combining model-free and model-based knowledge enhancement techniques, including data augmentation and prompt-based labeling, for improved web query NER.
Findings
DeepTagger outperforms baseline models on multiple NER benchmarks.
Knowledge augmentation significantly improves recognition accuracy.
Prompt-based labeling reduces reliance on annotated datasets.
Abstract
Named entity recognition (NER) is a crucial task for online advertisement. State-of-the-art solutions leverage pre-trained language models for this task. However, three major challenges remain unresolved: web queries differ from natural language, on which pre-trained models are trained; web queries are short and lack contextual information; and labeled data for NER is scarce. We propose DeepTagger, a knowledge-enhanced NER model for web-based ads queries. The proposed knowledge enhancement framework leverages both model-free and model-based approaches. For model-free enhancement, we collect unlabeled web queries to augment domain knowledge; and we collect web search results to enrich the information of ads queries. We further leverage effective prompting methods to automatically generate labels using large language models such as ChatGPT. Additionally, we adopt a model-based knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies
