Auto-survey Challenge

Thanh Gia Hieu Khuong (TAU; LISN); Benedictus Kent Rachmat (TAU; LISN)

arXiv:2310.04480·cs.CL·October 11, 2023

Auto-survey Challenge

Thanh Gia Hieu Khuong (TAU, LISN), Benedictus Kent Rachmat (TAU, LISN)

PDF

Open Access

TL;DR

This paper introduces a platform and competition for evaluating Large Language Models' ability to autonomously generate and critique survey papers across various disciplines, simulating peer review.

Contribution

It presents a novel evaluation framework for LLMs involving autonomous survey creation and critique, with a structured competition and assessment criteria.

Findings

01

Baseline models demonstrated varying levels of quality in survey generation.

02

Evaluation methods effectively measured clarity, references, and content value.

03

The platform enables benchmarking LLM capabilities in scholarly tasks.

Abstract

We present a novel platform for evaluating the capability of Large Language Models (LLMs) to autonomously compose and critique survey papers spanning a vast array of disciplines including sciences, humanities, education, and law. Within this framework, AI systems undertake a simulated peer-review mechanism akin to traditional scholarly journals, with human organizers serving in an editorial oversight capacity. Within this framework, we organized a competition for the AutoML conference 2023. Entrants are tasked with presenting stand-alone models adept at authoring articles from designated prompts and subsequently appraising them. Assessment criteria include clarity, reference appropriateness, accountability, and the substantive value of the content. This paper presents the design of the competition, including the implementation baseline submissions and methods of evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Radiomics and Machine Learning in Medical Imaging