Effort-Optimized, Accuracy-Driven Labelling and Validation of Test Inputs for DL Systems: A Mixed-Integer Linear Programming Approach

Mohammad Hossein Amini; Mehrdad Sabetzadeh; Shiva Nejati

arXiv:2507.04990·cs.CV·March 31, 2026

Effort-Optimized, Accuracy-Driven Labelling and Validation of Test Inputs for DL Systems: A Mixed-Integer Linear Programming Approach

Mohammad Hossein Amini, Mehrdad Sabetzadeh, Shiva Nejati

PDF

TL;DR

This paper introduces OPAL, a MILP-based human-assisted labelling method for DL systems that targets high accuracy with minimal manual effort, validated through extensive experiments on vision datasets.

Contribution

OPAL is a novel MILP formulation that optimizes labelling effort to achieve a desired accuracy level, outperforming baseline methods in efficiency and accuracy.

Findings

01

OPAL achieves an average accuracy of 98.8% while halving manual labelling effort.

02

OPAL outperforms automated labelling baselines in accuracy across nine datasets.

03

Active learning with OPAL further reduces manual effort by 4.5% without losing accuracy.

Abstract

Software systems increasingly include AI components based on deep learning (DL). Reliable testing of such systems requires near-perfect test-input validity and label accuracy, with minimal human effort. Yet, the DL community has largely overlooked the need to build highly accurate datasets with minimal effort, since DL training is generally tolerant of labelling errors. This challenge, instead, reflects concerns more familiar to software engineering, where a central goal is to construct high-accuracy test inputs, with accuracy as close to 100% as possible, while keeping associated costs in check. In this article we introduce OPAL, a human-assisted labelling method that can be configured to target a desired accuracy level while minimizing the manual effort required for labelling. The main contribution of OPAL is a mixed-integer linear programming (MILP) formulation that minimizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.