MRCLens: an MRC Dataset Bias Detection Toolkit

Yifan Zhong; Haohan Wang; Eric P. Xing

arXiv:2207.08943·cs.CL·July 20, 2022

MRCLens: an MRC Dataset Bias Detection Toolkit

Yifan Zhong, Haohan Wang, Eric P. Xing

PDF

Open Access

TL;DR

MRCLens is a toolkit designed to detect dataset biases in Machine Reading Comprehension, helping researchers identify biases early to improve model generalization and robustness.

Contribution

It introduces MRCLens, a novel toolkit for early bias detection in MRC datasets, along with a categorization of common biases.

Findings

01

Detects biases before full model training

02

Categorizes common MRC dataset biases

03

Facilitates bias-aware data and model adjustments

Abstract

Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification