ACL-rlg: A Dataset for Reading List Generation

Julien Aubert-B\'educhaud (LS2N); Florian Boudin (LS2N; JFLI),; B\'eatrice Daille (LS2N); Richard Dufour (LS2N)

arXiv:2502.15692·cs.DL·February 25, 2025

ACL-rlg: A Dataset for Reading List Generation

Julien Aubert-B\'educhaud (LS2N), Florian Boudin (LS2N, JFLI),, B\'eatrice Daille (LS2N), Richard Dufour (LS2N)

PDF

Open Access

TL;DR

This paper introduces ACL-rlg, the largest expert-annotated dataset for reading list generation, framing it as a retrieval task and evaluating baseline methods including GPT-4o, highlighting challenges in existing search tools.

Contribution

The paper presents ACL-rlg, a new large-scale dataset for reading list generation, and establishes evaluation baselines, addressing a gap in scholarly literature retrieval.

Findings

01

Traditional search engines perform poorly on reading list generation.

02

GPT-4o shows better results but may have data contamination issues.

03

The dataset enables new research in automated scholarly reference compilation.

Abstract

Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights the fact that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Recommender Systems and Techniques