Stochastic Adversarial Networks for Multi-Domain Text Classification

Xu Wang; Yuan Wu

arXiv:2406.00044·cs.CL·June 4, 2024

Stochastic Adversarial Networks for Multi-Domain Text Classification

Xu Wang, Yuan Wu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Stochastic Adversarial Networks (SAN) for multi-domain text classification, which models domain-specific features as distributions to efficiently handle multiple domains without increasing model size.

Contribution

The paper proposes a novel SAN framework that models domain-specific parameters as Gaussian distributions, enabling scalable multi-domain learning with stable adversarial training.

Findings

01

SAN achieves competitive results on MDTC benchmarks.

02

Model size remains constant regardless of domain number.

03

Incorporates domain label smoothing and pseudo-label regularization.

Abstract

Adversarial training has been instrumental in advancing multi-domain text classification (MDTC). Traditionally, MDTC methods employ a shared-private paradigm, with a shared feature extractor for domain-invariant knowledge and individual private feature extractors for domain-specific knowledge. Despite achieving state-of-the-art results, these methods grapple with the escalating model parameters due to the continuous addition of new domains. To address this challenge, we introduce the Stochastic Adversarial Network (SAN), which innovatively models the parameters of the domain-specific feature extractor as a multivariate Gaussian distribution, as opposed to a traditional weight vector. This design allows for the generation of numerous domain-specific feature extractors without a substantial increase in model parameters, maintaining the model's size on par with that of a single…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

The proposed model performs strong in the benchmark dataset, with minimized learning parameters. The design of using both shared/private feature extractor is interesting and effective in merging the domain in the latent space. The proposed method is straightforward and easy to understand.

Weaknesses

1. Though the proposal seems to be effective and achieving strong performance, the model itself still uses a relative old adversarial backbone, with the discriminator approach for removing the domain invariant feature. The two-feature-extractor approach is interesting, but that is mainly to deal with parameter increase in the MDTC problem. It would be great to see other design improvement in the model. 2. The performance gain in using the proposed model is marginal on the Amazon review/FDU-MTL d

Reviewer 02Rating 1· strong rejectConfidence 4

Strengths

The paper demonstrates that the authors are well aware of the challenges in MDTC and are familiar with various tools in deep learning (such as reparametrization trick, label smoothing, pseudo labelling etc).

Weaknesses

I have some concerns about this work. 1. Assuming the design of proposed model is sensible (in fact I have doubts on this; see 2), the work heuristically puts together a bunch of well-known techniques to improve performance. Works of primarily such a nature, although potentially valuable in practice, do not possess enough novelty that justifies a publication in ICLR. 2. I have doubts on the proposed approach in the "stochastic" part. Let us track the parameter $W_1$ of the domain-specific fea

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. This paper proposes a novel approach, called Stochastic Adversarial Network, to reduce the computational cost while meeting a large amount of domains. 2. This paper originally employs Gaussian distribution to generate private extractors in order to circumvent the extensive parameters found in previous works. 3. This paper conducts numerous experiments to show the effectiveness of the proposed scheme. Moreover, the parameter sensitivity and ablation study demonstrate the rationale of paramete

Weaknesses

1. The motivation is trivial. It is hard to say that the model size is the bottleneck of the training process according to Table.1 and 9. 342.91M is absolutely fine in current period. Further, inference process may gain nothing in the aspect of computational acceleration as we only choose one private extractor from the Domain Discriminator D. 2. The baselines are outdated and improvements on two benchmarks are limited. According to Table 2,3 and 4, it can hardly convince me that the proposed mo

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Digital Media Forensic Detection