Trankit: A Light-Weight Transformer-based Toolkit for Multilingual   Natural Language Processing

Minh Van Nguyen; Viet Dac Lai; Amir Pouran Ben Veyseh; and Thien Huu; Nguyen

arXiv:2101.03289·cs.CL·October 18, 2021

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh, and Thien Huu, Nguyen

PDF

1 Repo

TL;DR

Trankit is a lightweight, transformer-based multilingual NLP toolkit that offers high performance across many languages and tasks while maintaining efficiency through a novel adapter mechanism.

Contribution

It introduces a memory-efficient, high-performance multilingual NLP toolkit built on a shared transformer model with a novel adapter-based mechanism.

Findings

01

Outperforms previous multilingual NLP pipelines in key tasks

02

Maintains efficiency in memory and speed despite using large transformers

03

Provides extensive pretrained pipelines for over 50 languages

Abstract

We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlp-uoregon/trankit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adapter · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections