Free to play: UN Trade and Development's experience with developing its   own open-source Retrieval Augmented Generation Large Language Model   application

Daniel Hopp

arXiv:2407.16896·cs.CY·July 25, 2024

Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application

Daniel Hopp

PDF

TL;DR

UNCTAD developed an open-source Retrieval Augmented Generation LLM application to enhance its domain-specific AI capabilities, reducing reliance on costly proprietary solutions and fostering institutional knowledge.

Contribution

The paper presents the development of an open-source RAG LLM application tailored for UNCTAD's needs, including libraries for document processing, local LLM deployment, and user interface, with publicly available code.

Findings

01

Open-source libraries facilitate domain-specific AI development.

02

In-house RAG LLM reduces costs and increases flexibility.

03

Tools are publicly available for broader use.

Abstract

Generative artificial intelligence (AI), and in particular Large Language Models (LLMs), have exploded in popularity and attention since the release to the public of ChatGPT's Generative Pre-trained Transformer (GPT)-3.5 model in November of 2022. Due to the power of these general purpose models and their ability to communicate in natural language, they can be useful in a range of domains, including the work of official statistics and international organizations. However, with such a novel and seemingly complex technology, it can feel as if generative AI is something that happens to an organization, something that can be talked about but not understood, that can be commented on but not contributed to. Additionally, the costs of adoption and operation of proprietary solutions can be both uncertain and high, a barrier for often cost-constrained international organizations. In the face of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.