MRCLens: an MRC Dataset Bias Detection Toolkit
Yifan Zhong, Haohan Wang, Eric P. Xing

TL;DR
MRCLens is a toolkit designed to detect dataset biases in Machine Reading Comprehension, helping researchers identify biases early to improve model generalization and robustness.
Contribution
It introduces MRCLens, a novel toolkit for early bias detection in MRC datasets, along with a categorization of common biases.
Findings
Detects biases before full model training
Categorizes common MRC dataset biases
Facilitates bias-aware data and model adjustments
Abstract
Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
