SkillNet-X: A Multilingual Multitask Model with Sparsely Activated   Skills

Zhangyin Feng; Yong Dai; Fan Zhang; Duyu Tang; Xiaocheng Feng,; Shuangzhi Wu; Bing Qin; Yunbo Cao; Shuming Shi

arXiv:2306.16176·cs.CL·June 29, 2023

SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills

Zhangyin Feng, Yong Dai, Fan Zhang, Duyu Tang, Xiaocheng Feng,, Shuangzhi Wu, Bing Qin, Yunbo Cao, Shuming Shi

PDF

Open Access

TL;DR

SkillNet-X is a multilingual multitask model that uses sparsely activated skill modules to effectively transfer knowledge across tasks and languages, outperforming existing models on diverse NLP datasets.

Contribution

The paper introduces SkillNet-X, a novel multilingual multitask model with sparsely activated skill modules that enhance cross-task and cross-language knowledge sharing.

Findings

01

Outperforms task-specific and multitask baselines on eleven datasets.

02

Skill pre-training further boosts performance across datasets.

03

Significantly outperforms baselines on two new tasks.

Abstract

Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-specific skills, each of which corresponds to a skill module. SkillNet-X sparsely activates parts of the skill modules which are relevant either to the target task or the target language. Acting as knowledge transit hubs, skill modules are capable of absorbing task-related knowledge and language-related knowledge consecutively. Based on Transformer, we modify the multi-head attention layer and the feed forward network layer to accommodate skill modules. We evaluate SkillNet-X on eleven natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Label Smoothing · Dense Connections · Adam · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer