Exploring the Latest LLMs for Leaderboard Extraction

Salomon Kabongo; Jennifer D'Souza; and S\"oren Auer

arXiv:2406.04383·cs.CL·July 10, 2024·1 cites

Exploring the Latest LLMs for Leaderboard Extraction

Salomon Kabongo, Jennifer D'Souza, and S\"oren Auer

PDF

Open Access

TL;DR

This study evaluates the effectiveness of various Large Language Models in extracting structured leaderboard data from AI research papers, comparing different input contexts to identify best practices for automation.

Contribution

It systematically assesses multiple LLMs and input formats for leaderboard extraction, providing new insights into their relative performance and limitations.

Findings

01

GPT-4-Turbo performs best among tested models.

02

Context type DocREC yields higher accuracy than others.

03

Significant variability in model performance depending on input format.

Abstract

The rapid advancements in Large Language Models (LLMs) have opened new avenues for automating complex tasks in AI research. This paper investigates the efficacy of different LLMs-Mistral 7B, Llama-2, GPT-4-Turbo and GPT-4.o in extracting leaderboard information from empirical AI research articles. We explore three types of contextual inputs to the models: DocTAET (Document Title, Abstract, Experimental Setup, and Tabular Information), DocREC (Results, Experiments, and Conclusions), and DocFULL (entire document). Our comprehensive study evaluates the performance of these models in generating (Task, Dataset, Metric, Score) quadruples from research papers. The findings reveal significant insights into the strengths and limitations of each model and context type, providing valuable guidance for future AI research automation efforts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer