Evaluating Large Language Models with Human Feedback: Establishing a   Swedish Benchmark

Birger Moell

arXiv:2405.14006·cs.CL·May 24, 2024

Evaluating Large Language Models with Human Feedback: Establishing a Swedish Benchmark

Birger Moell

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Swedish language benchmark using human feedback to evaluate and compare the performance of various large language models, addressing a gap in resources for less-represented languages.

Contribution

It presents a new Swedish benchmark with human feedback for LLM evaluation and releases a tool to facilitate future research and model assessment in Swedish.

Findings

01

Evaluated 11 large language models on Swedish language tasks.

02

Provided a publicly available benchmark tool for Swedish LLM assessment.

03

Aims to establish a leaderboard for Swedish language models.

Abstract

In the rapidly evolving field of artificial intelligence, large language models (LLMs) have demonstrated significant capabilities across numerous applications. However, the performance of these models in languages with fewer resources, such as Swedish, remains under-explored. This study introduces a comprehensive human benchmark to assess the efficacy of prominent LLMs in understanding and generating Swedish language texts using forced choice ranking. We employ a modified version of the ChatbotArena benchmark, incorporating human feedback to evaluate eleven different models, including GPT-4, GPT-3.5, various Claude and Llama models, and bespoke models like Dolphin-2.9-llama3b-8b-flashback and BeagleCatMunin. These models were chosen based on their performance on LMSYS chatbot arena and the Scandeval benchmarks. We release the chatbotarena.se benchmark as a tool to improve our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BirgerMoell/SwedishLLMBenchmark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings