# CAMB: A comprehensive industrial LLM benchmark on civil aviation maintenance

**Authors:** Feng Zhang, Chengjie Pang, Yuehan Zhang, Chenyu Luo

arXiv: 2508.20420 · 2025-08-29

## TL;DR

This paper introduces CAMB, a specialized benchmark for evaluating large language models in civil aviation maintenance, addressing a critical gap in domain-specific reasoning and knowledge assessment.

## Contribution

It develops an industrial-grade, standardized benchmark for LLM evaluation in civil aviation maintenance, enabling targeted improvements and domain-specific model tuning.

## Key findings

- Benchmark effectively identifies domain knowledge gaps.
- Evaluation reveals strengths and weaknesses of existing LLMs in aviation tasks.
- Open-source resources facilitate further research and development.

## Abstract

Civil aviation maintenance is a domain characterized by stringent industry standards. Within this field, maintenance procedures and troubleshooting represent critical, knowledge-intensive tasks that require sophisticated reasoning. To address the lack of specialized evaluation tools for large language models (LLMs) in this vertical, we propose and develop an industrial-grade benchmark specifically designed for civil aviation maintenance. This benchmark serves a dual purpose: It provides a standardized tool to measure LLM capabilities within civil aviation maintenance, identifying specific gaps in domain knowledge and complex reasoning. By pinpointing these deficiencies, the benchmark establishes a foundation for targeted improvement efforts (e.g., domain-specific fine-tuning, RAG optimization, or specialized prompt engineering), ultimately facilitating progress toward more intelligent solutions within civil aviation maintenance. Our work addresses a significant gap in the current LLM evaluation, which primarily focuses on mathematical and coding reasoning tasks. In addition, given that Retrieval-Augmented Generation (RAG) systems are currently the dominant solutions in practical applications , we leverage this benchmark to evaluate existing well-known vector embedding models and LLMs for civil aviation maintenance scenarios. Through experimental exploration and analysis, we demonstrate the effectiveness of our benchmark in assessing model performance within this domain, and we open-source this evaluation benchmark and code to foster further research and development:https://github.com/CamBenchmark/cambenchmark

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20420/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20420/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/2508.20420/full.md

---
Source: https://tomesphere.com/paper/2508.20420