Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages

Michael Andersland

arXiv:2403.06354·cs.CL·March 12, 2024·2 cites

Amharic LLaMA and LLaVA: Multimodal LLMs for Low Resource Languages

Michael Andersland

PDF

Open Access 1 Repo 2 Models 5 Datasets

TL;DR

This paper presents the development of Amharic LLaMA and LLaVA, multimodal large language models tailored for the low-resource Amharic language, utilizing data augmentation and visual instruction tuning to enhance performance.

Contribution

The work introduces the first multimodal LLM for Amharic, combining data augmentation, open source translation, and visual instruction tuning techniques for low-resource languages.

Findings

01

Achieved improved language understanding in Amharic.

02

Developed a multimodal model capable of understanding images and text.

03

Open sourced models and dataset for community use.

Abstract

Large Language Models (LLMs) like GPT-4 and LLaMA have shown incredible proficiency at natural language processing tasks and have even begun to excel at tasks across other modalities such as vision and audio. Despite their success, LLMs often struggle to perform well on low-resource languages because there is so little training data available. This shortcoming is especially prevalent with open source models. In this work, we explore training LLaMA-2 to speak Amharic, a language which is spoken by over 50 million people world wide, but has orders of magnitude less data available than languages like English. We employ methods previously used for training LLMs on other languages with data scarcity, and use open source translation models to perform data augmentation and grow our dataset from millions of tokens to billions. We further enhance the capabilities of our model by connecting an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iocuydi/amharic-llama-llava
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Lexicography and Language Studies

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings · Dropout · Softmax · Residual Connection