Elsevier OA CC-By Corpus
Daniel Kershaw, Rob Koeling

TL;DR
The paper introduces the Elsevier OA CC-BY corpus, a comprehensive open dataset of scientific research papers across disciplines, including full texts, metadata, and references.
Contribution
It presents the first open, representative corpus of scientific articles with detailed metadata and bibliographic information across multiple disciplines.
Findings
Provides a large, diverse dataset for scientific NLP research.
Enables cross-disciplinary analysis of scientific literature.
Facilitates development of open access scientific text mining tools.
Abstract
We introduce the Elsevier OA CC-BY corpus. This is the first open corpus of Scientific Research papers which has a representative sample from across scientific disciplines. This corpus not only includes the full text of the article, but also the metadata of the documents, along with the bibliographic information for each reference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
