Attribution-Guided Masking for Robust Cross-Domain Sentiment Classification
Shubham Harkare, Arvind Yogesh Suresh Babu, Yash Kulkarni

TL;DR
This paper introduces Attribution-Guided Masking (AGM), a training method that improves cross-domain sentiment classification by reducing reliance on domain-specific tokens, enhancing generalization without target labels.
Contribution
The paper proposes AGM, a novel attribution-guided masking technique that dynamically penalizes spurious tokens during fine-tuning, improving zero-shot transfer performance and interpretability.
Findings
AGM achieves competitive transfer accuracy compared to strong baselines.
AGM suppresses attribution on domain-specific tokens like mentions and hashtags.
Removing attribution-guided masking degrades transfer performance.
Abstract
While pre-trained Transformer models achieve high accuracy on in-domain sentiment classification, they frequently experience severe performance degradation when transferring to out-of-domain data. We hypothesize that this generalization gap is driven by reliance on domain-specific spurious tokens. After demonstrating that post-hoc-token-level attribution drift fails to predict this gap, we propose Attribution-Guided Masking (AGM), a training time intervention that dynamically detects and penalizes highly attributed spurious tokens during fine-tuning. AGM's core component is a gradient based attribution masking loss (), which can optionally be combined with a counterfactual contrastive loss to enforce domain-invariant representations, all without requiring target-domain labels or human annotation. Evaluated in a strict zero-shot transfer setting across four diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
