Classification with Conceptual Safeguards
Hailey Joren, Charles Marx, Berk Ustun

TL;DR
This paper introduces a conceptual safeguard framework for classification models that verifies intermediate concepts to enhance safety, coverage, and human oversight, especially under uncertainty.
Contribution
It presents a novel safeguard architecture that uses intermediate concept predictions to improve safety and coverage in classification tasks, with methods for uncertainty propagation and human review.
Findings
Improves model safety by abstaining on uncertain predictions.
Enhances coverage through human confirmation of concepts.
Benchmarks show increased performance and coverage in real-world datasets.
Abstract
We propose a new approach to promote safety in classification tasks with established concepts. Our approach -- called a conceptual safeguard -- acts as a verification layer for models that predict a target outcome by first predicting the presence of intermediate concepts. Given this architecture, a safeguard ensures that a model meets a minimal level of accuracy by abstaining from uncertain predictions. In contrast to a standard selective classifier, a safeguard provides an avenue to improve coverage by allowing a human to confirm the presence of uncertain concepts on instances on which it abstains. We develop methods to build safeguards that maximize coverage without compromising safety, namely techniques to propagate the uncertainty in concept predictions and to flag salient concepts for human review. We benchmark our approach on a collection of real-world and synthetic datasets,…
Peer Reviews
Decision·ICLR 2024 poster
The authors provide a good motivation and introduction. Authors also provide emperical validations on multiple datasets. The problem statement is very relevant to practical problems and provide an insight into how to automate classification tasks by making it safe and interpretable.
The writing and flow could be improved better, some of them are raised in questions below. Table 1 is referenced in Section 1, however what the columns means is defined only in Section 2, which makes it harder to read the table meaning. It would also be better to provide more details in the evaluation dataset around what each datasets means, and some statistics around it. In my opinion the paper lacks novelty in terms of the innovation, and answers to the questions raised would help to unders
The work appears to be the first to use concept bottleneck models to capture the uncertainty of the entire model for selective classification. Moreover, the idea of getting human feedback to confirm concepts to improve selective classification is quite interesting and adds to the increasing literature of human-in-the-loop algorithms. The paper is very well organized, has a clear structure, and is nicely written. The authors clearly state their contributions as well as the assumptions of their
Even though the meaning of coverage might be clear to experts in selective classification, it might be helpful to include a high level definition of coverage in the introduction, so that it is clear for a broader ML audience. In Proposition 4, the authors assume a perfectly calibrated predictor. However, in practice, perfect calibration is impossible. As a results, it would be useful to include theoretical results that complement proposition 4 that account for the calibration error a classifi
The three strengths of the proposed approach are a functional abstaining method, requests for confirmation, and uncertainty propagation. Together these methods raise a classification model to something that is more intelligent, capable of some corrective action when faced with unusual inputs.
1. The uncertainty propagation methodology doesn't seem computationally efficient. 2. The performance of the default classifier (always predict majority class, uniformly randomly abstain) ought to be included in Table 2. The default performance ought to always be presented when using accuracy as a performance metric.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification
