TL;DR
Trankit is a lightweight, transformer-based multilingual NLP toolkit that offers high performance across many languages and tasks while maintaining efficiency through a novel adapter mechanism.
Contribution
It introduces a memory-efficient, high-performance multilingual NLP toolkit built on a shared transformer model with a novel adapter-based mechanism.
Findings
Outperforms previous multilingual NLP pipelines in key tasks
Maintains efficiency in memory and speed despite using large transformers
Provides extensive pretrained pipelines for over 50 languages
Abstract
We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adapter · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections
