ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

Uri Shaham; Maor Ivgi; Avia Efrat; Jonathan Berant; Omer; Levy

arXiv:2305.14196·cs.CL·December 19, 2023·1 cites

ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, Omer, Levy

PDF

Open Access 1 Repo

TL;DR

ZeroSCROLLS is a new zero-shot benchmark designed to evaluate large language models' ability to understand long texts across diverse tasks without training data, highlighting current model strengths and challenges.

Contribution

The paper introduces ZeroSCROLLS, a novel zero-shot benchmark with new datasets and tasks for long text understanding, and provides a comprehensive evaluation of existing large language models.

Findings

01

Claude outperforms ChatGPT on ZeroSCROLLS

02

GPT-4 achieves the highest average score among evaluated models

03

Models struggle with aggregation tasks, indicating room for improvement.

Abstract

We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts, which contains only test and small validation sets, without training data. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks, such as aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a comprehensive evaluation of both open-source and closed large language models, finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest average score. However, there is still room for improvement on multiple open challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to pass the naive baseline. As the state of the art is a moving target, we invite researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tau-nlp/zero_scrolls
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Test · Absolute Position Encodings · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing