Comparative Study of Domain Driven Terms Extraction Using Large Language Models
Sandeep Chataut, Tuyen Do, Bichar Dip Shrestha Gurung, Shiva Aryal,, Anup Khanal, Carol Lushbough, Etienne Gnimpieba

TL;DR
This study compares the effectiveness of three large language models in extracting domain-specific keywords from datasets, highlighting the importance of prompt engineering and addressing challenges like hallucination and resource demands.
Contribution
It provides a comparative analysis of Llama2-7B, GPT-3.5, and Falcon-7B for keyword extraction using LLMs, with performance evaluation on Inspec and PubMed datasets.
Findings
GPT-3.5 achieved the highest Jaccard scores on both datasets.
Prompt engineering significantly influences keyword extraction quality.
Challenges include model complexity, resource use, and hallucination effects.
Abstract
Keywords play a crucial role in bridging the gap between human understanding and machine processing of textual data. They are essential to data enrichment because they form the basis for detailed annotations that provide a more insightful and in-depth view of the underlying data. Keyword/domain driven term extraction is a pivotal task in natural language processing, facilitating information retrieval, document summarization, and content categorization. This review focuses on keyword extraction methods, emphasizing the use of three major Large Language Models(LLMs): Llama2-7B, GPT-3.5, and Falcon-7B. We employed a custom Python package to interface with these LLMs, simplifying keyword extraction. Our study, utilizing the Inspec and PubMed datasets, evaluates the performance of these models. The Jaccard similarity index was used for assessment, yielding scores of 0.64 (Inspec) and 0.21…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Natural Language Processing Techniques · Web Data Mining and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Weight Decay · Adam · Cosine Annealing
