Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation

Wallace Albertini; Marina Cond\'e Ara\'ujo; J\'ulia Cond\'e Ara\'ujo; Antonio Pedro Santos Alves; Marcos Kalinowski

arXiv:2602.18357·cs.SE·February 23, 2026

Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation

Wallace Albertini, Marina Cond\'e Ara\'ujo, J\'ulia Cond\'e Ara\'ujo, Antonio Pedro Santos Alves, Marcos Kalinowski

PDF

Open Access

TL;DR

This paper introduces a statistically robust approach called SCFC for evaluating AI systems' functional correctness, linking business requirements to confidence measures that account for performance variability.

Contribution

It presents a novel method for assessing AI functional correctness using statistical confidence, bridging the gap between theoretical standards and practical evaluation.

Findings

01

The SCFC approach provides a feasible way to quantify confidence in AI correctness.

02

Case studies show the approach's utility and ease of use in industry.

03

Experts find the method valuable for practical AI quality assessment.

Abstract

The quality assessment of Artificial Intelligence (AI) systems is a fundamental challenge due to their inherently probabilistic nature. Standards such as ISO/IEC 25059 provide a quality model, but they lack practical and statistically robust methods for assessing functional correctness. This paper proposes and evaluates the Statistical Confidence in Functional Correctness (SCFC) approach, which seeks to fill this gap by connecting business requirements to a measure of statistical confidence that considers both the model's average performance and its variability. The approach consists of four steps: defining quantitative specification limits, performing stratified and probabilistic sampling, applying bootstrapping to estimate a confidence interval for the performance metric, and calculating a capability index as a final indicator. The approach was evaluated through a case study on two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Safety Systems Engineering in Autonomy