
TL;DR
The fuzzy ROC extends traditional ROC analysis to handle uncertain data points by defining bounds for sensitivity and specificity, and provides visual tools to summarize multiple indeterminacy zone choices.
Contribution
It introduces a fuzzy ROC framework that incorporates indeterminacy regions, addressing sensitivity, specificity bounds, and visualization challenges in uncertain classification scenarios.
Findings
Defines sensitivity and specificity bounds under indeterminacy.
Provides visualization methods for multiple indeterminacy zones.
Enhances ROC analysis for uncertain data points.
Abstract
The fuzzy ROC extends Receiver Operating Curve (ROC) visualization to the situation where some data points, falling in an indeterminacy region, are not classified. It addresses two challenges: definition of sensitivity and specificity bounds under indeterminacy; and visual summarization of the large number of possibilities arising from different choices of indeterminacy zones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications · Rough Sets and Fuzzy Logic
The Fuzzy ROC
Giovanni Parmigiani
Dana Farber Cancer Institute, 450 Brookline Avenue, Boston 02115, U.S.A.
and Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston 02115, U.S.A.
Abstract
The fuzzy ROC extends Receiver Operating Curve (ROC) visualization to the situation where some data points, falling in an indeterminacy region, are not classified. It addresses two challenges: definition of sensitivity and specificity bounds under indeterminacy; and visual summarization of the large number of possibilities arising from different choices of indeterminacy zones.
keywords:
\KWDReceiver Operating Curves (ROC) , Indeterminacy in classification
††journal: Pattern Recognition Letters
1 Introduction
Receiver Operating Curves (ROC) help with the visual assessment of the performance of classifiers. Fawcett (2006) reviews the field and points out that “ROC graphs are commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data mining research”.
I consider here the basic case of binary classification using a continuous score, such as a classification probability, or a quantitative biomarker. Traditionally, classification is simply implemented by a cutoff dichotomizing the score. In more recent applications, classification may includes an intermediate area of indeterminacy, which I will call gray zone.
For a famous example, Parker et al. (2009) present the PAM50 risk predictor of breast cancers, which provides a continuous risk score. In clinical applications, this score is most often split into three categories: low, intermediate and high. Women in the low and high categories are directed to specific clinical strategies. Women in the intermediate category are considered on a case by case basis by their clinicians. From an algorithmic standpoint, the intermediate group is not classified. Similarly, machine learning algorithms for classification of pathology and radiology images may allow for certain areas to be routed to further human examination. In these cases indeterminacy helps with practical implementation, by handling safe cases algorithmically and complex ones by human intervention.
Here I describe an algorithm for visualizing bounds on sensitivity/specificity pairs, for short fuzzy ROC, to assess the performance range of classifiers allowing for a region of indeterminacy, or gray zone. I try to address two challenges. The first is the definition of sensitivity and specificity bound when there is indeterminacy. The second is the visual summarization of the large number of possibilities arising from different choices of gray zones.
2 Algorithm
Consider a validation study of labeled subjects, with scores , . Without loss, let the first subjects have label [math] and the rest have label . Also, low levels of the score are taken to predict class [math]. The proportion of ’s in the target population is , and may differ from the validation study proportion , for example if the design of the validation study is a case-control.
A gray zone is defined by the interval . The extremes are the lower and higher cutoff. Cases with score below are classified as [math]’s. Cases above are classified as ’s. The rest remain unclassified.
Users of fuzzy ROC need to specify a maximum tolerated percentage of unclassified cases, . Let be the number of class points falling in the gray zone. A gray zone satisfies the -constraint if the proportion of cases in the gray zone is less than , that is if . A gray zone satisfies the target population -constraint if .
The fuzzy ROC algorithm is a model-free visualization. The basic building blocks are bounds on the cumulative frequencies associated with a given gray zone .
First, the most favorable bound on these frequencies is calculated assuming perfect discrimination within the gray zone. Imagine an oracle would take care of the points in the gray zone on behalf of the classifier, by moving them to the extremes of the gray zone so that they can be classified correctly. Formally, define the starred scores as follows:
[TABLE]
Let be the indicator of the set , and define the cumulative frequencies:
[TABLE]
Conversely, the least favorable frequencies are constructed considering the worst case scenario for the points within the gray zone. Imagine now that a saboteur may be in charge of the points in the gray zone, by moving them to extremes of the gray zone, so that they are all classified incorrectly. This would result in the ”daggered” scores, defined as:
[TABLE]
Now define the cumulative frequencies:
[TABLE]
We can form a large number of starred and daggered pairs of cumulative frequencies satisfying the -constraint. The fuzzy ROC algorithm simplifies the visualization of these pairs by grouping them, and selecting a single higher and lower limit within each group, as follows.
Consider the observed unique ranked values of the biomarker . These points will constitute the set of possible values for the extremes of the gray zone. Now define the midpoints between two consecutive values as for . For each , consider the set of pairs built by first adding the two neighboring observed points on either side, then the next two and so forth. This process continues as long as the gray zone satisfies the -constraint. If one of the extremes of the distribution is reached, the process continues on the other side. Among the resulting intervals, the fuzzy ROC chooses the ”best” for visualization, defined as follows. For each , it eliminates the cases in the gray zone and then computes the AUC curves on the classified cases only. The pair maximizing the AUC so defined is . The generating is not necessarily the midpoint of this interval, but will be contained in it. If multiple gray zones are tied in this maximization, the algorithm minimizes gray zone size among optima. In this way, gray zones are not used in regions where discrimination is not helped by not classifying cases.
Then, the upper limits are defined by the set of points
[TABLE]
as varies. Conversely, the lower limits are defined by the set of points
[TABLE]
for . To implement, define the degenerate gray zones and as the empty set.
Fix to be either [math] or . The sequences defined by and as varies in do not necessarily define proper cumulative distributions, as they would in a standard ROC analysis. Rather the intent is to provide bounds to the sensitivity / specificity pairs available over a range of possible gray area strategies.
Starred and daggered curves are calculated using both classified and unclassified samples. The exclusion of the unclassified samples only affects the calculation of .
I explored an alternative implementation where the lower and upper limit of the gray area are used in turn to index the AUC optimization, instead of the midpoints. Upper and lower limits can produce markedly different results. Bounds are less stable than the midpoints when sample sizes are small. Nonetheless, this strategy provides a different view of the overlap in the tails, and may turn out to be useful in some applications.
3 Illustration
To illustrate the application and interpretation of the fuzzy ROC, I consider a gene expression biomarker for the prediction of suboptimal (class [math]) versus optimal (class ) surgical debulking in ovarian cancer patients. Data are available from the CuratedOvarianData Bioconductor package by Ganzfried et al. (2013). Clinical and biological background can be found in Riester et al. (2014). The specific biomarker presented here reflects the transcriptional level of the gene ZNF544, as measured using an Agilent microarray by Yoshihara et al. (2012).
Figure 1 shows the observed biomarker levels by class. Higher level of expression are generally associated with optimal debulking (class 1). Figure 1 also illustrates the type of hypothetical scenarios that enter as building block in the construction of the fuzzy ROC, to visually represent the definitions of and .
Each of hypothetical scenarios in Figure 1 enter the optimization used to find the ’s. These in turn are used to form the starred and daggered sensitvity and specificity bounds. Figure 2 shows segments connecting starred and daggered points corresponding to the two bounds associated with the same . These can be used to explore potential gray area strategies. Say one is interested in a classifier with approximately 80% specificity and 70% sensitivity. ZNF544 does not reach this performance. The upper points inform us that if one were allowed to pass 20% of suitably chosen observations to the oracle, than ZNF544 could reach close to the desired sensitivity/specificity trade-off. It also informs us that if the same observations were passed to the saboteur, the sensitivity and specificity would drop close to the diagonal line of no discrimination.
Figure 2 also shows, in the right panel, the region defined by the starred points as the upper limit, and by the daggered points as the lower limit. Points within the region are not easily interpretable in terms of the optimization of the previous section. The shading is purely a visual aid.
Figure 3 shows fuzzy ROC visualizations corresponding to four additional choices of .
Figure 2 also illustrates that the region defined by the upper and lower limits in the fuzzy ROC algorithm is not necessarily convex.
If the fuzzy ROC region collapses to the standard ROC line, also drawn in Figures 2 and 3.
In regions where the two class-specific distributions have little overlap, say left of , there can be little or no advantage in allowing for a gray zone. Conversely, where the density of biomarker points in the two classes is similar, a gray zone has the potential to improve the practical implementation of the biomarker. Figure 4 depicts this trade-off by elucidating where in the biomarker range the gray area is useful. Only in a narrow range of values does the fuzzy ROC algorithm needs to make full use of the 20% of data points allowed for the gray zone (top panel).
Lastly, Figure 5 shows fuzzy ROCs for four additional genes, chosen in part to illustrate less common features. Regions can be disjoint, when stretches of non-empty gray areas are followed by stretches of empty gray areas. Often this is associated with lack of monotonicity in the likelihood ratio of the two conditional biomarker distributions.
ZNF487 exemplifies a biomarker with relatively good discrimination. The upper bounds indicates that correct reclassification of as few as 20% of cases could lead to high discrimination. This reclassification could be achieved by biomarkers that prove effective in the gray zone for ZNF487. The lower bound indicates that, if unclassified observations are handled poorly, the performance suffers, but discrimination remains above chance by a clear margin even with a gray area of 20%.
4 Discussion
I am not aware of a good visualization approach to examine classification algorithms that allow for an area of indeterminacy. I hope the fuzzy ROC approach will prove of practical help.
Allowing for a gray area differs from multi-class ROC analysis (e.g. Hand and Till (2001)), where the number of labels is greater than two. It also differs from semisupervised analyses, where some cases are not labeled. Here all cases have a known binary label but some are not classified.
The fuzzyROC is not a visualization of uncertainty about the ROC curve in the standard statistical sense. Both the upper and lower bound are themselves point estimates, and their variability could be address by simple resampling approaches. Yet visualizing both the set and uncertainty about the set boundaries could be challenging. Also, is expressed in terms of the (potentially rescaled) proportion of cases in the validation study, without consideration for uncertainty.
The fuzzyROC is not an approach for optimizing the size of the indeterminacy zone. It only uses optimization to home in on a useful subset of options for visualization.
The oracle and saboteur scenarios are extreme. Variants of this algorithm could be constructed by further specifying bounds on the proportion of cases that could be correctly classify by a human if left in the gray area. Then instead of moving all the gray area points to extremes, these known proportions could be used to move only some of the points and achieve less extreme bounds. These classification proportion could potentially depend on the biomarker region.
From a statistical perspective, indeterminacy can also help characterize regions of the score with poor discriminatory ability. Thus, compared to fully deterministic approaches, allowing for indeterminacy may lead to a different evaluation of classifiers and different approaches to biomarker discovery.
Acknowledgments
Work supported by NIH grant 4P30CA006516-51 and NSF grant DMS-1810829. Work currently submitted to Pattern Recognition Letters. The fuzzyROC R package used to produce the analysis presented in this paper is freely available for direct install at https://github.com/gp1d/fuzzyROC.git.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Fawcett (2006) Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters 27 (8), 861 – 874, ROC Analysis in Pattern Recognition. URL http://www.sciencedirect.com/science/article/pii/S 016786550500303 X
- 2Ganzfried et al. (2013) Ganzfried, B. F., Riester, M., Haibe-Kains, B., Risch, T., Tyekucheva, S., Jazic, I., Wang, X. V., Ahmadifar, M., Birrer, M. J., Parmigiani, G., Huttenhower, C., Waldron, L., 2013. curated Ovarian Data: clinically annotated data for the ovarian cancer transcriptome. Database (Oxford) 2013, bat 013, p MCID: PMC 3625954. URL http://dx.doi.org/10.1093/database/bat 013 · doi ↗
- 3Hand and Till (2001) Hand, D. J., Till, R. J., Nov 2001. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning 45 (2), 171–186. URL https://doi.org/10.1023/A:1010920819831 · doi ↗
- 4Parker et al. (2009) Parker, J. S., Mullins, M., Cheang, M. C. U., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X., Hu, Z., Quackenbush, J. F., Stijleman, I. J., Palazzo, J., Marron, J. S., Nobel, A. B., Mardis, E., Nielsen, T. O., Ellis, M. J., Perou, C. M., Bernard, P. S., Mar. 2009. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes. Journal of Clinical Oncology 27 (8), 1160–1167.
- 5Riester et al. (2014) Riester, M., Wei, W., Waldron, L., Culhane, A. C., Trippa, L., Oliva, E., Kim, S.-H., Michor, F., Huttenhower, C., Parmigiani, G., Birrer, M. J., Apr 2014. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J Natl Cancer Inst. URL http://dx.doi.org/10.1093/jnci/dju 048 · doi ↗
- 6Yoshihara et al. (2012) Yoshihara, K., Tsunoda, T., Shigemizu, D., Fujiwara, H., Hatae, M., Fujiwara, H., Masuzaki, H., Katabuchi, H., Kawakami, Y., Okamoto, A., Nogawa, T., Matsumura, N., Udagawa, Y., Saito, T., Itamochi, H., Takano, M., Miyagi, E., Sudo, T., Ushijima, K., Iwase, H., Seki, H., Terao, Y., Enomoto, T., Mikami, M., Akazawa, K., Tsuda, H., Moriya, T., Tajima, A., Inoue, I., Tanaka, K., 2012. High-risk ovarian cancer based on 126-gene expression signature is uniquely characterize
