SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale, Michael Sapienza, Steven Ripplinger, Simon, Gibbs, Jaewon Lee, Pranav Mistry

TL;DR
SUTRA is a scalable, efficient multilingual language model architecture that outperforms existing models on multilingual benchmarks and can access internet knowledge for accurate, up-to-date responses.
Contribution
Introduces SUTRA, a novel multilingual LLM architecture with decoupled core understanding and language-specific processing, enhancing scalability and performance.
Findings
Surpasses GPT-3.5 and Llama2 by 20-30% on MMLU benchmarks.
Demonstrates efficient multilingual alignment and learning.
Provides up-to-date, factual responses using internet knowledge.
Abstract
In this paper, we introduce SUTRA, multilingual Large Language Model architecture capable of understanding, reasoning, and generating text in over 50 languages. SUTRA's design uniquely decouples core conceptual understanding from language-specific processing, which facilitates scalable and efficient multilingual alignment and learning. Employing a Mixture of Experts framework both in language and concept processing, SUTRA demonstrates both computational efficiency and responsiveness. Through extensive evaluations, SUTRA is demonstrated to surpass existing models like GPT-3.5, Llama2 by 20-30% on leading Massive Multitask Language Understanding (MMLU) benchmarks for multilingual tasks. SUTRA models are also online LLMs that can use knowledge from the internet to provide hallucination-free, factual and up-to-date responses while retaining their multilingual capabilities. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Dropout · Linear Warmup With Cosine Annealing · Residual Connection · Byte Pair Encoding · Adam · Softmax · Attention Is All You Need
