Perhaps PTLMs Should Go to School -- A Task to Assess Open Book and Closed Book QA
Manuel R. Ciosici, Joe Cecil, Alex Hedges, Dong-Ho Lee, Marjorie, Freedman, Ralph Weischedel

TL;DR
This paper introduces a new task and benchmark for evaluating pre-trained language models on understanding instructional texts, highlighting their limited zero-shot and knowledge transfer capabilities in open and closed book question answering.
Contribution
It proposes a novel educational question-answering task with a new dataset and leaderboard, assessing PTLMs' ability to understand and utilize textbook content in different settings.
Findings
PTLMs perform around 50-56% on the task, close to random chance.
Adding textbooks to pre-training yields minimal performance gains.
Open-book setting improves accuracy to about 60%.
Abstract
Our goal is to deliver a new task and leaderboard to stimulate research on question answering and pre-trained language models (PTLMs) to understand a significant instructional document, e.g., an introductory college textbook or a manual. PTLMs have shown great success in many question-answering tasks, given significant supervised training, but much less so in zero-shot settings. We propose a new task that includes two college-level introductory texts in the social sciences (American Government 2e) and humanities (U.S. History), hundreds of true/false statements based on review questions written by the textbook authors, validation/development tests based on the first eight chapters of the textbooks, blind tests based on the remaining textbook chapters, and baseline results given state-of-the-art PTLMs. Since the questions are balanced, random performance should be ~50%. T5, fine-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Residual Connection · Byte Pair Encoding · SentencePiece · Dropout · Dense Connections · Softmax · Gated Linear Unit
