SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

Hong Chen; Hiroya Takamura; Hideki Nakayama

arXiv:2110.10774·cs.CL·October 22, 2021

SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

Hong Chen, Hiroya Takamura, Hideki Nakayama

PDF

Open Access

TL;DR

This paper introduces SciXGen, a large-scale dataset for context-aware scientific text generation, and benchmarks its effectiveness in generating descriptions and paragraphs using state-of-the-art models.

Contribution

It presents a novel dataset, SciXGen, with over 200,000 annotated scientific papers for advancing context-aware text generation in science.

Findings

01

SciXGen improves scientific text generation quality.

02

State-of-the-art models perform well on the dataset.

03

The dataset is publicly available for research.

Abstract

Generating texts in scientific papers requires not only capturing the content contained within the given input but also frequently acquiring the external information called \textit{context}. We push forward the scientific text generation by proposing a new task, namely \textbf{context-aware text generation} in the scientific domain, aiming at exploiting the contributions of context in generated texts. To this end, we present a novel challenging large-scale \textbf{Sci}entific Paper Dataset for Conte\textbf{X}t-Aware Text \textbf{Gen}eration (SciXGen), consisting of well-annotated 205,304 papers with full references to widely-used objects (e.g., tables, figures, algorithms) in a paper. We comprehensively benchmark, using state-of-the-arts, the efficacy of our newly constructed SciXGen dataset in generating description and paragraph. Our dataset and benchmarks will be made publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques