Design, Results and Industry Implications of the World's First Insurance Large Language Model Evaluation Benchmark
Hua Zhou (Central University of Finance, Economics), Bing Ma (Central University of Finance, Economics), Yufei Zhang (Zetavision AI Lab), Yi Zhao (Zetavision AI Lab)

TL;DR
This paper introduces CUFEInse v1.0, a comprehensive evaluation benchmark for insurance-focused large language models, assessing their knowledge, industry understanding, safety, and logical reasoning, with implications for academia and industry.
Contribution
It presents the first systematic, multi-dimensional evaluation framework for insurance LLMs, filling a critical gap in professional benchmarks and guiding model development in vertical domains.
Findings
General-purpose models show weak actuarial and compliance skills.
Domain-specific training improves insurance scenario performance.
Current models struggle with professional reasoning and compliance tasks.
Abstract
This paper comprehensively elaborates on the construction methodology, multi-dimensional evaluation system, and underlying design philosophy of CUFEInse v1.0. Adhering to the principles of "quantitative-oriented, expert-driven, and multi-validation," the benchmark establishes an evaluation framework covering 5 core dimensions, 54 sub-indicators, and 14,430 high-quality questions, encompassing insurance theoretical knowledge, industry understanding, safety and compliance, intelligent agent application, and logical rigor. Based on this benchmark, a comprehensive evaluation was conducted on 11 mainstream large language models. The evaluation results reveal that general-purpose models suffer from common bottlenecks such as weak actuarial capabilities and inadequate compliance adaptation. High-quality domain-specific training demonstrates significant advantages in insurance vertical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInsurance and Financial Risk Management · Big Data and Digital Economy · Explainable Artificial Intelligence (XAI)
