Is Open Source the Future of AI? A Data-Driven Approach

Domen Vake; Bogdan \v{S}inik; Jernej Vi\v{c}i\v{c}; Aleksandar; To\v{s}i\'c

arXiv:2501.16403·cs.SE·January 29, 2025

Is Open Source the Future of AI? A Data-Driven Approach

Domen Vake, Bogdan \v{S}inik, Jernej Vi\v{c}i\v{c}, Aleksandar, To\v{s}i\'c

PDF

Open Access

TL;DR

This paper uses data-driven analysis to evaluate the role of open-source development in advancing large language models, highlighting its benefits and challenges for future AI model dissemination.

Contribution

It provides empirical data on open-source contributions to LLMs, informing debates on open versus proprietary AI development strategies.

Findings

01

Open-source contributions improve model performance.

02

Open models tend to be smaller with acceptable accuracy loss.

03

Community engagement positively influences open-source LLM development.

Abstract

Large Language Models (LLMs) have become central in academia and industry, raising concerns about privacy, transparency, and misuse. A key issue is the trustworthiness of proprietary models, with open-sourcing often proposed as a solution. However, open-sourcing presents challenges, including potential misuse, financial disincentives, and intellectual property concerns. Proprietary models, backed by private sector resources, are better positioned for return on investment. There are also other approaches that lie somewhere on the spectrum between completely open-source and proprietary. These can largely be categorised into open-source usage limitations protected by licensing, partially open-source (open weights) models, hybrid approaches where obsolete model versions are open-sourced, while competitive versions with market value remain proprietary. Currently, discussions on where on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Scientific Computing and Data Management