Benchmarks for Pir\'a 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change
Paulo Pirozelli, Marcos M. Jos\'e, Igor Silveira, Fl\'avio Nakasato,, Sarajane M. Peres, Anarosa A. F. Brand\~ao, Anna H. R. Costa, Fabio G. Cozman

TL;DR
This paper introduces Pirá 2.0, a comprehensive benchmark dataset for scientific reading comprehension about the ocean, Brazilian coast, and climate change, including new benchmarks, dataset improvements, and baseline results.
Contribution
It presents six new benchmarks for Pirá, a curated and extended dataset, and provides baseline results to facilitate future research in scientific question answering.
Findings
Established six benchmark tasks for Pirá 2.0
Curated and extended dataset with translations and paraphrases
Provided baseline results for future comparison
Abstract
Pir\'a is a reading comprehension dataset focused on the ocean, the Brazilian coast, and climate change, built from a collection of scientific abstracts and reports on these topics. This dataset represents a versatile language resource, particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge. Despite its potential, a detailed set of baselines has not yet been developed for Pir\'a. By creating these baselines, researchers can more easily utilize Pir\'a as a resource for testing machine learning models across a wide range of question answering tasks. In this paper, we define six benchmarks over the Pir\'a dataset, covering closed generative question answering, machine reading comprehension, information retrieval, open question answering, answer triggering, and multiple choice question answering. As part of this effort, we have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
