ACL-rlg: A Dataset for Reading List Generation
Julien Aubert-B\'educhaud (LS2N), Florian Boudin (LS2N, JFLI),, B\'eatrice Daille (LS2N), Richard Dufour (LS2N)

TL;DR
This paper introduces ACL-rlg, the largest expert-annotated dataset for reading list generation, framing it as a retrieval task and evaluating baseline methods including GPT-4o, highlighting challenges in existing search tools.
Contribution
The paper presents ACL-rlg, a new large-scale dataset for reading list generation, and establishes evaluation baselines, addressing a gap in scholarly literature retrieval.
Findings
Traditional search engines perform poorly on reading list generation.
GPT-4o shows better results but may have data contamination issues.
The dataset enables new research in automated scholarly reference compilation.
Abstract
Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights the fact that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Recommender Systems and Techniques
