SILT: Efficient transformer training for inter-lingual inference
Javier Huertas-Tato, Alejandro Mart\'in, David Camacho

TL;DR
This paper introduces SILT, a novel efficient transformer architecture that aligns multilingual embeddings for inter-lingual Natural Language Inference, reducing parameters and achieving state-of-the-art results across benchmarks.
Contribution
SILT is a new architecture that enables efficient inter-lingual NLI by leveraging frozen multilingual transformers and a matrix alignment method, reducing training complexity.
Findings
SILT achieves state-of-the-art performance on multilingual NLI benchmarks.
The model drastically reduces the number of trainable parameters.
SILT effectively processes unmatched language pairs in inter-lingual inference.
Abstract
The ability of transformers to perform precision tasks such as question answering, Natural Language Inference (NLI) or summarising, have enabled them to be ranked as one of the best paradigm to address Natural Language Processing (NLP) tasks. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established relationships between a hypothesis and a premise. Nevertheless, these models suffer from incapacity to generalise to other domains or difficulties to face multilingual and interlingual scenarios. The leading pathway in the literature to address these issues involve designing and training extremely large architectures, which leads to unpredictable behaviours and to establish barriers which impede broad access and fine tuning. In this paper, we propose a new architecture called Siamese Inter-Lingual Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Adam · Layer Normalization · Label Smoothing
