IMO: Greedy Layer-Wise Sparse Representation Learning for   Out-of-Distribution Text Classification with Pre-trained Models

Tao Feng; Lizhen Qu; Zhuang Li; Haolan Zhan; Yuncheng Hua; Gholamreza; Haffari

arXiv:2404.13504·cs.CL·April 23, 2024

IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models

Tao Feng, Lizhen Qu, Zhuang Li, Haolan Zhan, Yuncheng Hua, Gholamreza, Haffari

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces IMO, a method that learns sparse invariant features and uses token-level attention to improve out-of-distribution text classification with pre-trained models, significantly outperforming baselines.

Contribution

The paper presents IMO, a novel approach combining sparse feature masks and token-level attention to enhance domain generalization in text classification.

Findings

01

IMO outperforms baseline models across multiple metrics

02

Sparse masks effectively remove irrelevant features

03

Token-level attention improves focus on predictive tokens

Abstract

Machine learning models have made incredible progress, but they still struggle when applied to examples from unseen domains. This study focuses on a specific problem of domain generalization, where a model is trained on one source domain and tested on multiple target domains that are unseen during training. We propose IMO: Invariant features Masks for Out-of-Distribution text classification, to achieve OOD generalization by learning invariant features. During training, IMO would learn sparse mask layers to remove irrelevant features for prediction, where the remaining features keep invariant. Additionally, IMO has an attention module at the token level to focus on tokens that are useful for prediction. Our comprehensive experiments show that IMO substantially outperforms strong baselines in terms of various evaluation metrics and settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

williamstoto/imo
pytorchOfficial

Videos

IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models· underline

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques

MethodsFocus