Generative AI for automatic topic labelling

Diego Kozlowski; Carolina Pradier; and Pierre Benz

arXiv:2408.07003·cs.CL·August 14, 2024·2 cites

Generative AI for automatic topic labelling

Diego Kozlowski, Carolina Pradier, and Pierre Benz

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of three large language models in automatically labeling research topics derived from scientific articles, demonstrating that GPT models can accurately generate concise labels, especially with three words.

Contribution

It introduces a novel assessment of LLMs for automatic topic labeling in scientific research, comparing their performance to manual interpretation.

Findings

01

GPT models accurately label topics from keywords

02

Three-word labels better capture research complexity

03

GPT models outperform manual labeling in precision

Abstract

Topic Modeling has become a prominent tool for the study of scientific fields, as they allow for a large scale interpretation of research trends. Nevertheless, the output of these models is structured as a list of keywords which requires a manual interpretation for the labelling. This paper proposes to assess the reliability of three LLMs, namely flan, GPT-4o, and GPT-4 mini for topic labelling. Drawing on previous research leveraging BERTopic, we generate topics from a dataset of all the scientific articles (n=34,797) authored by all biology professors in Switzerland (n=465) between 2008 and 2020, as recorded in the Web of Science database. We assess the output of the three models both quantitatively and qualitatively and find that, first, both GPT models are capable of accurately and precisely label topics from the models' output keywords. Second, 3-word labels are preferable to grasp…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Softmax · Linear Layer · Attention Dropout · Label Smoothing · Dense Connections · Dropout