Benchmarking Large Language Models for Quebec Insurance: From Closed-Book to Retrieval-Augmented Generation
David Beauchemin, Richard Khoury

TL;DR
This paper evaluates large language models for Quebec insurance advisory tasks, highlighting the importance of reasoning, retrieval methods, and model size in achieving expert-level accuracy while addressing deployment challenges.
Contribution
Introduces AEPC-QA benchmark and provides a comprehensive evaluation of 51 LLMs in insurance advisory, revealing insights on reasoning, retrieval, and model size effects.
Findings
Chain-of-thought reasoning improves accuracy.
Retrieval-Augmented Generation boosts knowledge but can cause distractions.
Large generalist models outperform smaller domain-specific ones.
Abstract
The digitization of insurance distribution in the Canadian province of Quebec, accelerated by legislative changes such as Bill 141, has created a significant "advice gap", leaving consumers to interpret complex financial contracts without professional guidance. While Large Language Models (LLMs) offer a scalable solution for automated advisory services, their deployment in high-stakes domains hinges on strict legal accuracy and trustworthiness. In this paper, we address this challenge by introducing AEPC-QA, a private gold-standard benchmark of 807 multiple-choice questions derived from official regulatory certification (paper) handbooks. We conduct a comprehensive evaluation of 51 LLMs across two paradigms: closed-book generation and retrieval-augmented generation (RAG) using a specialized corpus of Quebec insurance documents. Our results reveal three critical insights: 1) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education
