\'Evaluation des capacit\'es de r\'eponse de larges mod\`eles de langage   (LLM) pour des questions d'historiens

Mathieu Chartier; Nabil Dakkoune; Guillaume Bourgeois; St\'ephane Jean

arXiv:2406.15173·cs.IR·June 24, 2024

\'Evaluation des capacit\'es de r\'eponse de larges mod\`eles de langage (LLM) pour des questions d'historiens

Mathieu Chartier, Nabil Dakkoune, Guillaume Bourgeois, St\'ephane Jean

PDF

Open Access

TL;DR

This study evaluates the ability of various large language models to accurately and reliably answer history questions in French, revealing significant shortcomings in accuracy, language handling, and response consistency.

Contribution

It provides a systematic assessment of LLMs' performance on French history questions, highlighting their limitations in accuracy and language quality.

Findings

01

LLMs show overall insufficient accuracy in historical responses.

02

Responses exhibit uneven handling of the French language.

03

Responses are often verbose and inconsistent.

Abstract

Large Language Models (LLMs) like ChatGPT or Bard have revolutionized information retrieval and captivated the audience with their ability to generate custom responses in record time, regardless of the topic. In this article, we assess the capabilities of various LLMs in producing reliable, comprehensive, and sufficiently relevant responses about historical facts in French. To achieve this, we constructed a testbed comprising numerous history-related questions of varying types, themes, and levels of difficulty. Our evaluation of responses from ten selected LLMs reveals numerous shortcomings in both substance and form. Beyond an overall insufficient accuracy rate, we highlight uneven treatment of the French language, as well as issues related to verbosity and inconsistency in the responses provided by LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques