Is Open Source the Future of AI? A Data-Driven Approach
Domen Vake, Bogdan \v{S}inik, Jernej Vi\v{c}i\v{c}, Aleksandar, To\v{s}i\'c

TL;DR
This paper uses data-driven analysis to evaluate the role of open-source development in advancing large language models, highlighting its benefits and challenges for future AI model dissemination.
Contribution
It provides empirical data on open-source contributions to LLMs, informing debates on open versus proprietary AI development strategies.
Findings
Open-source contributions improve model performance.
Open models tend to be smaller with acceptable accuracy loss.
Community engagement positively influences open-source LLM development.
Abstract
Large Language Models (LLMs) have become central in academia and industry, raising concerns about privacy, transparency, and misuse. A key issue is the trustworthiness of proprietary models, with open-sourcing often proposed as a solution. However, open-sourcing presents challenges, including potential misuse, financial disincentives, and intellectual property concerns. Proprietary models, backed by private sector resources, are better positioned for return on investment. There are also other approaches that lie somewhere on the spectrum between completely open-source and proprietary. These can largely be categorised into open-source usage limitations protected by licensing, partially open-source (open weights) models, hybrid approaches where obsolete model versions are open-sourced, while competitive versions with market value remain proprietary. Currently, discussions on where on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Scientific Computing and Data Management
