SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
David Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan

TL;DR
SciRIFF is a high-quality, expert-written dataset of 137K instruction-following instances across scientific tasks, enabling improved language model performance in scientific literature understanding.
Contribution
The paper introduces SciRIFF, a novel dataset designed to enhance instruction-following capabilities of language models specifically for scientific literature tasks.
Findings
LLMs finetuned on SciRIFF outperform baselines by 70.6% on out-of-distribution tasks.
SciRIFF covers 54 scientific tasks with complex instructions and structured outputs.
The dataset improves LLMs' ability to extract, summarize, and verify scientific information.
Abstract
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following instances for training and evaluation, covering 54 tasks. These tasks span five core scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF is unique in being entirely expert-written, high-quality instruction-following dataset for extracting and synthesizing information from research literature across diverse scientific fields. It features complex instructions with long input contexts, detailed task descriptions, and structured outputs. To demonstrate its utility, we finetune a series of large language models (LLMs) using a mix of general-domain and SciRIFF instructions. On nine out-of-distribution held-out tasks (referred to as SciRIFF-Eval), LLMs finetuned on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
