Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G\"orselle Sohbet   Etmek

Ahmed Zeer; Eren Dogan; Yusuf Erdem; Elif Ince; Osama Shbib; M. Egemen; Uzun; Atahan Uz; M. Kaan Yuce; H. Toprak Kesgin; M. Fatih Amasyali

arXiv:2412.02760·cs.AI·December 5, 2024

Cosmos-LLaVA: Chatting with the Visual Cosmos-LLaVA: G\"orselle Sohbet Etmek

Ahmed Zeer, Eren Dogan, Yusuf Erdem, Elif Ince, Osama Shbib, M. Egemen, Uzun, Atahan Uz, M. Kaan Yuce, H. Toprak Kesgin, M. Fatih Amasyali

PDF

TL;DR

This paper introduces Cosmos-LLaVA, a Turkish visual instruction model that combines large language models and image coders, analyzing how architecture and dataset choices affect performance in addressing Turkish language deficiencies.

Contribution

The study develops Cosmos-LLaVA, a novel Turkish visual instruction model, and provides an in-depth analysis of how different architectures and datasets influence its performance.

Findings

01

Model architecture significantly impacts performance.

02

Dataset selection plays a crucial role in model effectiveness.

03

Fine-tuning with various datasets improves Turkish visual instruction capabilities.

Abstract

In this study, a Turkish visual instruction model was developed and various model architectures and dataset combinations were analysed to improve the performance of this model. The Cosmos-LLaVA model, which is built by combining different large language models and image coders, is designed to overcome the deficiencies in the Turkish language. In the experiments, the effects of fine-tuning with various datasets on the model performance are analysed in detail. The results show that model architecture and dataset selection have a significant impact on performance. Bu \c{c}al{\i}\c{s}mada bir T\"urk\c{c}e g\"orsel talimat modeli geli\c{s}tirilerek bu modelin performans{\i}n{\i} art{\i}rmaya y\"onelik \c{c}e\c{s}itli model mimarileri ve veri k\"umesi kombinasyonlar{\i} derinlemesine incelenmi\c{s}tir. Farkl{\i} b\"uy\"uk dil modelleri ve g\"or\"unt\"u kodlay{\i}c{\i}lar{\i}n{\i}n bir araya…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.