STREET: A Multi-Task Structured Reasoning and Explanation Benchmark
Danilo Ribeiro, Shen Wang, Xiaofei Ma, Henry Zhu, Rui Dong, Deguang, Kong, Juliette Burger, Anjelica Ramos, William Wang, Zhiheng Huang, George, Karypis, Bing Xiang, Dan Roth

TL;DR
STREET is a comprehensive benchmark designed to evaluate models on multi-task, multi-domain reasoning and explanation generation, emphasizing the importance of structured, step-by-step reasoning in natural language understanding.
Contribution
This work introduces STREET, a novel benchmark that combines multi-task and multi-domain reasoning with explanation generation, facilitating better training and evaluation of models on complex reasoning tasks.
Findings
Models like GPT-3 and T5 perform below human level on structured reasoning tasks.
Structured explanations are challenging for current models to generate accurately.
The benchmark encourages development of models capable of multi-step reasoning and explanation.
Abstract
We introduce STREET, a unified multi-task and multi-domain natural language reasoning and explanation benchmark. Unlike most existing question-answering (QA) datasets, we expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer. We perform extensive evaluation with popular language models such as few-shot prompting GPT-3 and fine-tuned T5. We find that these models still lag behind human performance when producing such structured reasoning steps. We believe this work will provide a way for the community to better train and test systems on multi-step reasoning and explanations in natural language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Test · Linear Layer · Adafactor · Attention Dropout · Cosine Annealing · Dense Connections · SentencePiece
