SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated   Counterfactuals

Zijian Zhang; Vinay Setty; Avishek Anand

arXiv:2205.01588·cs.CL·May 4, 2022

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

Zijian Zhang, Vinay Setty, Avishek Anand

PDF

TL;DR

SparcAssist is a versatile risk assessment tool for NLP models that uses sparse counterfactuals generated through token replacements to evaluate model behavior and aid human annotators.

Contribution

It introduces a novel approach for model risk evaluation using sparse counterfactuals generated by multiple algorithms, aiding deployment decisions.

Findings

01

Effective in identifying model vulnerabilities

02

Counterfactuals assist human risk assessment

03

Potential to improve NLP model robustness

Abstract

We introduce SparcAssist, a general-purpose risk assessment tool for the machine learning models trained for language tasks. It evaluates models' risk by inspecting their behavior on counterfactuals, namely out-of-distribution instances generated based on the given data instance. The counterfactuals are generated by replacing tokens in rational subsequences identified by ExPred, while the replacements are retrieved using HotFlip or Masked-Language-Model-based algorithms. The main purpose of our system is to help the human annotators to assess the model's risk on deployment. The counterfactual instances generated during the assessment are the by-product and can be used to train more robust NLP models in the future.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsCounterfactuals Explanations