Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation

David Beauchemin; Richard Khoury

arXiv:2603.07825·cs.CL·March 10, 2026

Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation

David Beauchemin, Richard Khoury

PDF

Open Access

TL;DR

This paper evaluates large language models for Quebec insurance advisory tasks, highlighting the importance of reasoning, retrieval methods, and model size in achieving expert-level accuracy while addressing deployment challenges.

Contribution

Introduces AEPC-QA benchmark and provides a comprehensive evaluation of 51 LLMs in insurance advisory, revealing insights on reasoning, retrieval, and model size effects.

Findings

01

Chain-of-thought reasoning improves accuracy.

02

Retrieval-Augmented Generation boosts knowledge but can cause distractions.

03

Large generalist models outperform smaller domain-specific ones.

Abstract

The digitization of insurance distribution in the Canadian province of Quebec, accelerated by legislative changes such as Bill 141, has created a significant "advice gap", leaving consumers to interpret complex financial contracts without professional guidance. While Large Language Models (LLMs) offer a scalable solution for automated advisory services, their deployment in high-stakes domains hinges on strict legal accuracy and trustworthiness. In this paper, we address this challenge by introducing AEPC-QA, a private gold-standard benchmark of 807 multiple-choice questions derived from official regulatory certification (paper) handbooks. We conduct a comprehensive evaluation of 51 LLMs across two paradigms: closed-book generation and retrieval-augmented generation (RAG) using a specialized corpus of Quebec insurance documents. Our results reveal three critical insights: 1) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education