SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals
Zijian Zhang, Vinay Setty, Avishek Anand

TL;DR
SparcAssist is a versatile risk assessment tool for NLP models that uses sparse counterfactuals generated through token replacements to evaluate model behavior and aid human annotators.
Contribution
It introduces a novel approach for model risk evaluation using sparse counterfactuals generated by multiple algorithms, aiding deployment decisions.
Findings
Effective in identifying model vulnerabilities
Counterfactuals assist human risk assessment
Potential to improve NLP model robustness
Abstract
We introduce SparcAssist, a general-purpose risk assessment tool for the machine learning models trained for language tasks. It evaluates models' risk by inspecting their behavior on counterfactuals, namely out-of-distribution instances generated based on the given data instance. The counterfactuals are generated by replacing tokens in rational subsequences identified by ExPred, while the replacements are retrieved using HotFlip or Masked-Language-Model-based algorithms. The main purpose of our system is to help the human annotators to assess the model's risk on deployment. The counterfactual instances generated during the assessment are the by-product and can be used to train more robust NLP models in the future.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCounterfactuals Explanations
