Designing large language model prompts to extract scores from messy text: A shared dataset and challenge
Mike Thelwall

TL;DR
This paper introduces a shared dataset of messy texts with research scores, and challenges the community to design prompts for LLMs to accurately extract these scores, aiming to improve over a baseline accuracy of 72.6%.
Contribution
It provides a novel dataset and challenge framework for prompt design to extract structured scores from unstructured, messy texts using LLMs.
Findings
Baseline accuracy of 72.6% for score extraction
Dataset includes 1446 texts with varying score formats
Challenge aims to improve prompt-based extraction methods
Abstract
In some areas of computing, natural language processing and information science, progress is made by sharing datasets and challenging the community to design the best algorithm for an associated task. This article introduces a shared dataset of 1446 short texts, each of which describes a research quality score on the UK scale of 1* to 4*. This is a messy collection, with some texts not containing scores and others including invalid scores or strange formats. With this dataset there is also a description of what constitutes a valid score and a "gold standard" of the correct scores for these texts (including missing values). The challenge is to design a prompt for Large Language Models (LLMs) to extract the scores from these texts as accurately as possible. The format for the response should be a number and no other text so there are two aspects to the challenge: ensuring that the LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Text Readability and Simplification · Topic Modeling
