ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in   Natural Language Understanding Dataset

Zhihua Jin; Xingbo Wang; Furui Cheng; Chunhui Sun; Qun Liu; Huamin Qu

arXiv:2208.08010·cs.HC·January 16, 2023

ShortcutLens: A Visual Analytics Approach for Exploring Shortcuts in Natural Language Understanding Dataset

Zhihua Jin, Xingbo Wang, Furui Cheng, Chunhui Sun, Qun Liu, Huamin Qu

PDF

Open Access

TL;DR

ShortcutLens is a visual analytics tool designed to help NLU researchers explore and understand shortcuts in benchmark datasets, addressing biases that can mislead model evaluation.

Contribution

The paper introduces ShortcutLens, a novel visual analytics system enabling systematic exploration of shortcuts in NLU datasets, aiding in better dataset creation.

Findings

01

Supports multi-level shortcut exploration

02

Enhances understanding of dataset biases

03

Inspires creation of more challenging benchmarks

Abstract

Benchmark datasets play an important role in evaluating Natural Language Understanding (NLU) models. However, shortcuts -- unwanted biases in the benchmark datasets -- can damage the effectiveness of benchmark datasets in revealing models' real capabilities. Since shortcuts vary in coverage, productivity, and semantic meaning, it is challenging for NLU experts to systematically understand and avoid them when creating benchmark datasets. In this paper, we develop a visual analytics system, ShortcutLens, to help NLU experts explore shortcuts in NLU benchmark datasets. The system allows users to conduct multi-level exploration of shortcuts. Specifically, Statistics View helps users grasp the statistics such as coverage and productivity of shortcuts in the benchmark dataset. Template View employs hierarchical and interpretable templates to summarize different types of shortcuts. Instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Computational and Text Analysis Methods · Multimodal Machine Learning Applications