Dynamically Acquiring Text Content to Enable the Classification of Lesser-known Entities for Real-world Tasks
Fahmida Alam, Ellen Riloff

TL;DR
This paper presents a framework that dynamically acquires descriptive text about lesser-known entities using web and large language models, enabling domain experts to create effective task-specific classifiers with minimal input.
Contribution
It introduces a novel text acquisition method leveraging web and LLMs to facilitate easy creation of entity classifiers from limited training data.
Findings
Achieved macro F1-score of 82.3% on SIC code classification.
Achieved macro F1-score of 72.9% on healthcare provider taxonomy classification.
Demonstrated effectiveness across two distinct real-world domains.
Abstract
Existing Natural Language Processing (NLP) resources often lack the task-specific information required for real-world problems and provide limited coverage of lesser-known or newly introduced entities. For example, business organizations and health care providers may need to be classified into a variety of different taxonomic schemes for specific application tasks. Our goal is to enable domain experts to easily create a task-specific classifier for entities by providing only entity names and gold labels as training data. Our framework then dynamically acquires descriptive text about each entity, which is subsequently used as the basis for producing a text-based classifier. We propose a novel text acquisition method that leverages both web and large language models (LLMs). We evaluate our proposed framework on two classification problems in distinct domains: (i) classifying organizations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
