Evaluating Large Language Models with Human Feedback: Establishing a Swedish Benchmark
Birger Moell

TL;DR
This paper introduces a Swedish language benchmark using human feedback to evaluate and compare the performance of various large language models, addressing a gap in resources for less-represented languages.
Contribution
It presents a new Swedish benchmark with human feedback for LLM evaluation and releases a tool to facilitate future research and model assessment in Swedish.
Findings
Evaluated 11 large language models on Swedish language tasks.
Provided a publicly available benchmark tool for Swedish LLM assessment.
Aims to establish a leaderboard for Swedish language models.
Abstract
In the rapidly evolving field of artificial intelligence, large language models (LLMs) have demonstrated significant capabilities across numerous applications. However, the performance of these models in languages with fewer resources, such as Swedish, remains under-explored. This study introduces a comprehensive human benchmark to assess the efficacy of prominent LLMs in understanding and generating Swedish language texts using forced choice ranking. We employ a modified version of the ChatbotArena benchmark, incorporating human feedback to evaluate eleven different models, including GPT-4, GPT-3.5, various Claude and Llama models, and bespoke models like Dolphin-2.9-llama3b-8b-flashback and BeagleCatMunin. These models were chosen based on their performance on LMSYS chatbot arena and the Scandeval benchmarks. We release the chatbotarena.se benchmark as a tool to improve our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings
