Opening up ChatGPT: Tracking openness, transparency, and accountability   in instruction-tuned text generators

Andreas Liesenfeld; Alianda Lopez; Mark Dingemanse

arXiv:2307.05532·cs.CL·July 13, 2023

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse

PDF

1 Repo

TL;DR

This paper assesses the openness, transparency, and accountability of instruction-tuned large language models, highlighting disparities in openness levels and emphasizing the importance of scientific documentation for responsible AI development.

Contribution

It provides a systematic evaluation of open-source LLM projects, documenting degrees of openness and revealing gaps in transparency and scientific rigor.

Findings

01

Many open-source projects use undocumented or legally dubious data

02

Few projects share instruction-tuning data or detailed documentation

03

Openness levels impact fairness and accountability in AI models

Abstract

Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

opening-up-chatgpt/opening-up-chatgpt.github.io
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.