Enumeration of Distinct Support Vectors for Interactive Decision Making
Kentaro Kanamori, Satoshi Hara, Masakazu Ishihata, Hiroki, Arimura

TL;DR
This paper introduces an efficient algorithm to enumerate the top K support vector machine models with distinct support vectors, enabling interactive model selection based on criteria beyond accuracy.
Contribution
It presents a novel K-best model enumeration algorithm for SVMs that efficiently finds multiple models with different support vectors for interactive decision-making.
Findings
Algorithm efficiently enumerates models with distinct support vectors.
Experiments demonstrate the method's effectiveness and usefulness.
Supports interactive model examination based on user requirements.
Abstract
In conventional prediction tasks, a machine learning algorithm outputs a single best model that globally optimizes its objective function, which typically is accuracy. Therefore, users cannot access the other models explicitly. In contrast to this, multiple model enumeration attracts increasing interests in non-standard machine learning applications where other criteria, e.g., interpretability or fairness, than accuracy are main concern and a user may want to access more than one non-optimal, but suitable models. In this paper, we propose a K-best model enumeration algorithm for Support Vector Machines (SVM) that given a dataset S and an integer K>0, enumerates the K-best models on S with distinct support vectors in the descending order of the objective function values in the dual SVM problem. Based on analysis of the lattice structure of support vectors, our algorithm efficiently findsā¦
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) Ā· Machine Learning and Data Classification Ā· Data Stream Mining Techniques
MethodsInterpretability Ā· Support Vector Machine
marginparsep has been altered.
topmargin has been altered.
marginparwidth has been altered.
marginparpush has been altered.
The page layout violates the ICML style. Please do not change the page layout, or include packages like geometry, savetrees, or fullpage, which change it for you. Weāre not able to reliably undo arbitrary changes to the style. Please remove the offending package(s), or layout-changing commands and try again.
Enumeration of Distinct Support Vectors for Interactive Decision Making
Anonymous Authors1
ā ā footnotetext: 1Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country. Correspondence to: Anonymous Author [email protected]. Ā
Preliminary work. Under review by the International Conference on Machine Learning (ICML). Do not distribute.
Abstract
In conventional prediction tasks, a machine learning algorithm outputs a single best model that globally optimizes its objective function, which typically is accuracy. Therefore, users cannot access the other models explicitly. In contrast to this, multiple model enumeration attracts increasing interests in non-standard machine learning applications where other criteria, e.g., interpretability or fairness, than accuracy are main concern and a user may want to access more than one non-optimal, but suitable models. In this paper, we propose a -best model enumeration algorithm for Support Vector Machines (SVM) that given a dataset and an integer , enumerates the -best models on with distinct support vectors in the descending order of the objective function values in the dual SVM problem. Based on analysis of the lattice structure of support vectors, our algorithm efficiently finds the next best model with small latency. This is useful in supporting usersās interactive examination of their requirements on enumerated models. By experiments on real datasets, we evaluated the efficiency and usefulness of our algorithm.
1 Introduction
Machine learning technologies are being widely applied to decision making in the real world. Recently, non-standard learning problems with criteria, such as interpretabilityĀ Ribeiro etĀ al. (2016); Angelino etĀ al. (2017) and fairnessĀ Hajian etĀ al. (2016); Crawford (2017), other than prediction accuracy attract increasing attention. In case that the predictions by a learning algorithm are not suitable to userās requirements, or violate critical constraints, it may no longer be usable in the actual world, even if it has high prediction accuracy.
To incorporate userās requirements into learning process, a new framework, called model enumeration, is recently proposedĀ Hara & Maehara (2017); Ruggieri (2017); Hara & Ishihata (2018). In this framework, an algorithm enumerates several models with different structures, possibly with the same objective values, instead of finding a single, optimal model. It has a number of advantages to enumerate models. The previous workĀ Hara & Maehara (2017) studied model enumeration focusing on enumeration of subsets of features. In contrast to this, we focus on enumeration of distinct models based on subsets of examples in a given dataset.
In this study, we propose an enumeration algorithm for Support Vector Machines (SVM)Ā Vapnik (1998). In the dual form of the SVM learning problem, its decision boundary, i.e., its model is represented by a linear combination of the subset of a given dataset, which is called support vectors. Adopting the dual form of the SVM learning problem and extending the enumeration method for Lasso byĀ Hara & Maehara (2017), we present an algorithm for enumerating SVM models that have distinct support vectors in the descending order of the dual form objective function values. Our approach has the following advantages:
- ā¢
**Data understanding: ** A single model that optimizes its objective function is not necessarily the best model that can explain the data well, due to, e.g., label noise or data contamination. By enumerating many models, we have a chance to access better models from the userās interests. This can be seen as a multiple version of example-based explanationĀ Bien & Tibshirani (2011).
- ā¢
**Interactive learning: ** In a long-term prediction service, a single optimal model may not continue to be the best model forever due to change of a userās interests or requirements. Our framework can be used to provide the next best model by a userās request to interactively examine and select some of enumerated models.
Contributions
In this paper, we make the following contributions.
We formulate a model enumeration problem for SVM as enumeration of SVM models with distinct support vectors in the descending order of the objective values. 2. 2.
We propose an efficient exact algorithm for the SVM model enumeration problem by extending the approach for Lasso byĀ Hara & Maehara (2017). Our algorithm can be extended to efficient top- enumeration. 3. 3.
By experiments on real datasets, we evaluate the efficiency and the effectiveness of our algorithm. We also show that there exist several models with different prediction results and fairness score although they have almost equal objective function values.
Related Work
Model enumeration attracts increasing attention in recent years. Enumeration algorithms for several machine learning models, such as LassoĀ Hara & Maehara (2017), decision treesĀ Ruggieri (2017), and rule modelsĀ Hara & Ishihata (2018), have been proposed. In addition, a method for simultaneously learning multiple diverse classifiers has been proposedĀ Ross etĀ al. (2018).
Example-based explanations are widely used for interpreting the distribution of a dataset. Several methods for selecting representative examples from a dataset, such as prototypesĀ Bien & Tibshirani (2011) and criticismsĀ Kim etĀ al. (2016), have been proposed. However, our method is different from theirs since our method is based on support vectors that represent an SVM model, and enumerates them in the descending order of the objective value.
In the context of SVM, solution pathĀ Hastie etĀ al. (2004) is a method for tracing changes of obtained models by varying its regularization parameter monotonically. It is similar to our problem since it considers generation of different SVM models. However, our problem is different from it since our problem fixes the regularization parameter unlike a solution path algorithm varies it, and our algorithm outputs more various models. The uniqueness of the SVM solution were discussed byĀ Burges & Crisp (2000).
2 Preliminaries
Let and and be the sets of all real numbers and all positive integers, respectively. For any , we denote by . For any indexed set and index subset , the subset of indexed by is defined by . We denote by and the input and output domains, respectively. In this paper, we assume the binary classification, i.e., and for some . A dataset of size is a finite set . A model is any function , and a model space is any set of models. For other definitions, see, e.g., Hastie etĀ al. (2001).
2.1 Support Vector Machines (SVM)
In the following discussion, we assume as hyperparameters a positive definite kernel function and a positive number , called a regularization parameter. In the following, we fix , , and , and omit them if it is clear from context. Note that our results are independent of the choice of and .
In this paper, we consider the dual form of SVMsĀ Cristianini & Shawe-Taylor (2000). We assume a given dataset . For any -dimensional vector , the objective function of SVMs, , is defined by
[TABLE]
The feasible solution space (or the model space) for SVMs, , is defined by the set of all Lagrange multipliers satisfying the conditions (i) and (ii) below:
[TABLE]
Now, the (ordinary) SVM learning problem is stated as the following maximization problem:
[TABLE]
Since the problem of Eq.Ā (3) is a convex quadratic programming problem, the solution found is global, not necessarily unique, and one of them can be efficiently computed by various methods such as SMOĀ Platt (1999).
By using , the prediction model (or the SVM model) associated to is given by
[TABLE]
where , and a threshold is determined by for any such that . Since a model is solely determined by , we also call a model as well as .
It is known that an optimal solution for an SVM tend to be a sparse vector. For any , we denote its support and support vectors by and , respectively. From Eq.Ā (1), we have the next lemma, which says that the value of the objective function depends only on .
Lemma 1
For any such that for some , .
Proof. Ā
Since for any , we have .
From Eq.Ā (2.1), we also see that the prediction result of SVM model depends only on the set of its support vectors.
3 Problem Formulation
Before introducing our enumeration problem for SVMs, we define the constrained SVM learning problem below. For any index subset , the constrained problem associated to is the problemĀ (3) with the constraint . Note that the problemĀ (5) is equivalent to the problemĀ (3) when the input is restricted to the subset .
Definition 1
For any given index subset , the constrained SVM learning problem with respect to is expressed as the following maximization problem:
[TABLE]
where is the constrained model space (or the feasible solution space) consisting of all Lagrange multipliers satisfying the conditions (i) and (ii) of Eq.Ā (2) and the additional condition (iii) .
In the above definition, the solution is called a support vector w.r.t.Ā . We remark that the value does not depend on the choice of since by condition (iii) and LemmaĀ 1.
Then, we denote the set of globally optimal solutions by where is the optimum value for the objectives.
The following property plays a key role in the analysis of our algorithm proposed later.
Proposition 1** **(key property of solutions)
Let be any index subset and be any solution w.r.t.Ā . For any , .
Proof. Ā
By assumption, we have (a) implies , and (b) implies . From (a) and (b), we have that if is optimal in , it is also optimal in . Thus, is proved.
An algorithm for the constrained SVM problem is any deterministic algorithm that given as well as , and , computes a solution for the SVM problem. From PropositionĀ 1, we make the following assumption on throughout this paper.
Assumption 1
For any , satisfies that implies .
For justification of AssumptionĀ 1, if the objective functionĀ is strictly convex, the set has the unique solutionĀ Burges & Crisp (2000), and thus, the assumption holds. If is not strictly convex, it is only known that is itself a convex setĀ Burges & Crisp (2000). It remains open if a sort of greedy variable selection strategies in, e.g., the SMOĀ Platt (1999) or the chunkingĀ Vapnik (1998) algorithms is sufficient to ensure AssumptionĀ 1.
Under the above assumption, the solution set for our enumeration problem on input is the collection of the distinct SVM models computed by for all possible index subsets of . We observe that the corresponding set of the supports is isomorphic to the quotient set w.r.t.Ā the equivalence relation defined by where each representative is written as .
Our goal is to enumerate all models of that have distinct support vectors in the descending order of their objective function values. Now, we state our problem as follows.
Problem 1** **(Enumeration problem for SVMs)
Given any dataset , parameter , and kernel function , the task is to enumerate all distinct models in in the descending order of their objective function values without duplicates.
Note that we fix the regularization parameter unlike the solution path for SVMsĀ Hastie etĀ al. (2004).
To solve ProblemĀ 1, a straightforward, but infeasible method is to simply collect over all exponentially many subsets in . This has redundancy w.r.t.Ā since some pair of subsets and may yield the same solution if they are equivalent. Hence, we seek for a more efficient method utilizing the sparseness of the SVM models in .
4 Algorithm
In this section, we propose an efficient algorithm EnumSV for solving ProblemĀ 1. EnumSV is based on Lawlerās frameworkĀ Lawler (1972) for top-K enumeration following the approach by Hara and Maehara to LassoĀ Hara & Maehara (2017).
4.1 The outline of our algorithm
In AlgorithmĀ 1, we show the outline of our algorithm EnumSV. It maintains as a data structure , which is a priority queue (or a heap)Ā Cormen etĀ al. (2009), to store triples consisting of
- ā¢
a discovered solution (a Lagrange multiplier),
- ā¢
an index set associated to by , and
- ā¢
a forbidden set to avoid searching redundant children of .
Triples are ordered in the descending order of their objective values as keys. For the heap , we can insert to any triple and extract (or deletemax) from the triple with the maximum key each in timeĀ Cormen etĀ al. (2009).
In Lawlerās framework, we can compute an optimal solution for each subproblems avoiding subproblems that yields redundant solutions.
Base Case: Initially, EnumSV starts by inserting the first triple at LineĀ 3, where corresponds to the solution for the ordinary SVM problem.
Inductive Case: While is not empty, EnumSV then repeats the following steps in the while-loop:
StepĀ 1
Extract a triple from the heap at LineĀ 6, where is called a candidate. Insert vector to as a solution at LineĀ 7 if it has not been founded yet.
StepĀ 2
Repeat the following steps for any :
- 1.
Branch the search spaces as at LineĀ 9. 2. 2.
Compute at LineĀ 10 and insert the triple , called a child of , into the heap at LineĀ 11. 3. 3.
Insert into to avoid inserting the same index subset into the heap twice at LineĀ 12.
StepĀ 3
Back to step 1. if the heap is not empty.
The most important step of EnumSV in AlgorithmĀ 1 is StepĀ 2 above. Based on PropositionĀ 1, it branches a search on each index . We can avoid redundant computations that yield the same solution that had already been output before. To avoid enumerating the same index subset multiple times, we add the used index into .
4.2 The correctness
In this subsection, we show the correctness of EnumSV in AlgorithmĀ 1 on input through propertiesĀ 1 andĀ 2 below. For every , denotes the -th solution in by EnumSV. We first show a main technical lemma.
Lemma 2
For any feasible solution , there exists some extracted from such that (i) , (ii) , and (iii) , where .
Proof. Ā
Let . Starting from the initial triple , we will go down the search space by visiting triples from a parent to its child for , while maintaining the invariant (ii). Base case: For , the first triple clearly satisfies the invariant . From LemmaĀ 1, if satisfies condition (i), the claim immediately follows.
Induction case: Let . Suppose inductively that satisfies (ii) . Then, there are two cases (1) and (2) below on the inclusion :
Case (1): holds. By induction hypothesis, we have , and thus, . Since is an optimal solution within , it follows that . Case (2): holds. For any , EnumSV inserts into the heap the triple at LineĀ 11, and it will be eventually extracted as the -th triple at some . By induction hypothesis, and hold, and thus, we have an invariant for the child iteration with . By the above arguments, at every time following a path to a child in Case (2), the size of the difference decrements at least by one. Since , this process must eventually halt at Case (1). This completes the proof.
From LemmaĀ 2, we can show the next lemma, saying that EnumSV eventually outputs any solution.
Lemma 3** **(PropertyĀ 1)
In EnumSV, for any subset , there exists some such that .
Proof. Ā
For , it follows from LemmaĀ 2 that there exists such that and . Since , we have .
Also from LemmaĀ 2, we have the next lemma for the top- computation, which says that EnumSV lists solutions exactly from larger to smaller values of .
Lemma 4** **(PropertyĀ 2)
EnumSV enumerates solutions in the descending order of their objective function , i.e., .
Proof. Ā
We show for any as follows. Suppose that is extracted by deletemax from the heap at step . If is in the heap, then immediately holds. Otherwise, there exists the triple where in the heap such that . Since , holds. From the definition of the heap, we have .
By combining LemmaĀ 3 and LemmaĀ 4, we show the main result of this paper.
Theorem 1
EnumSV in AlgorithmĀ 1 solves ProblemĀ 1.
Proof. Ā
From LemmasĀ 3 and 4, EnumSV returns a collection of models that satisfy Properties 1 and 2. Thus, EnumSV solves ProblemĀ 1.
4.3 Top- Enumeration
We can modify AlgorithmĀ 1 to find the top- models for a given positive integer as follows. We simulate AlgorithmĀ 1, perform the enumeration of models in the descending order of their objectives, and then stop AlgorithmĀ 1 when eventually holds. From LemmaĀ 4, we see that and contains the top- models.
Complexity.
For enumeration algorithms, it is the custom to analyze their time complexity in terms of the number of solutions, or in output-sensitive mannerĀ Avis & Fukuda (1996). However, it is difficult because more than one equivalent candidates can result the model . Instead, we estimate its time complexity in terms of a candidate solution extracted in LineĀ 6. The time complexity of AlgorithmĀ 1 for obtaining a candidate solution of -th solution is , where is the complexity of solving an SVM problem.
5 Experiments
In this section, we evaluate our algorithm by experiments on real datasets. All codes were implemented in PythonĀ 3.6 with scikit-learn. We used linear kernel as the kernel function in all experiments. All experiments were conducted on 64-bit Ubuntu 18.04.1 LTS with Intel Xeon E5-1620 v4 3.50GHz CPU and 62.8GiB Memory.
5.1 UCI Datasets
We first evaluated EnumSV on three real datasets, German (), Ionosphere (), and Sonar () from UCI ML repositoryĀ Dheeru & KarraĀ Taniskidou (2017). Their task is a binary classification. We randomly split each dataset into train () and test () samples, and evaluated the test loss by the hinge loss . For each dataset, the hyperparameter was selected by -fold cross validation among before enumeration.
We applied EnumSV to these datasets, and enumerated top- models. FigureĀ 1 presents the values of the ratio of the objective function value and the ratio of the test loss of the -th enumerated model to those of the best model . FigureĀ 1Ā (a) shows that the values of the objective function decreases as the rank increases as expected from TheoremĀ 1. For German dataset, the objective function values of top- were almost same within deviation of . It indicates that there are multiple models achieving the almost identical objective value. FigureĀ 1Ā (b) shows that some enumerated model, such as for German, for Ionosphere, and for Sonar, had smaller test loss compared with the optimal model . It means that an optimal model is not always the best model, and we obtained a better model with a lower test loss than an optimal model by enumerating models.
FigureĀ 2 presents the total time of enumerating top- models. The total time seems almost linear in rank . Consequently, we conclude that EnumSV has small latency for outputting solutions independent of their ranks, and thus, is scalable in the number of enumerated models.
5.2 Injected COMPAS Dataset
Next, we demonstrate an application of EnumSV to a fair classification scenario under false data injection attacks. To evaluate the fairness of the model for the sensitive attribute , we used demographic parity (DP)Ā Calders etĀ al. (2009) defined by
[TABLE]
where is a probability on the joint distribution over . We note that the larger the DP, the larger the discrimination of prediction.
We used COMPAS dataset () related to recidivism risk prediction distributed at Adebayo (2018). The task is to predict whether individual people recidivate within two years from their criminal history. We used the attribute āAfrican_Americanā as a sensitive attribute . We assume a scenario of false data injectionĀ Mo etĀ al. (2010) that is a special kind of attacks to learning algorithms, which increases the DP of the learned model for the sensitive attribute by flipping output labels of a small subset of a training dataset. To reproduce this scenario, we generated injected subsets of the COMPAS by the following steps:
Create a training dataset by randomly sampling a subset of the COMPAS with examples. 2. 2.
Randomly choose a subset of such that with examples, and replace these outputs by . 3. 3.
Create a test dataset by randomly sampling from the COMPAS with examples.
By our preliminary experiments, we confirm that the above procedure increases the DP of SVM models on .
We applied EnumSV to the above injected COMPAS dataset, and measured objective function values, demographic parity (DP), and misclassification ratio of the top-50 enumerated models. We observed that all the enumerated top- models had the same objective value. However, these prediction results were mutually different.
FigureĀ 2 (a) presents the value of the DP of the enumerated models, where the dashed line indicates the reference DP value of the model learned by the non-injected subset of the input . FigureĀ 2 (b) presents the misclassification ratio of the enumerated models on . From the figures, we observed that EnumSV found the three fair models , , and achieving lower DP thanĀ and lower misclassification ratio than . Consequently, EnumSV successfully obtained several fair models against false data injection by enumerating models.
6 Conclusion
In this paper, we proposed an efficient algorithm to enumerate top- SVM models with distinct support vectors in descending order of these objective function values. By experiments on real datasets, we demonstrated that our framework provides better models than one single optimal solution, and fair models against false data injection, which increases the unfairness of an optimal model. As future work, we will try to make theoretical or empirical justification of AssumptionĀ 1 for a particular class of SVM learning algorithms such as chunkingĀ Vapnik (1998) and SMOĀ Platt (1999). It is also interesting future work to extend our algorithm to enumerate models taking their diversity into account so as to interactively help users to understand a dataset.
Acknowledgements
This work was partially supported by JSPS KAKENHI(S) 15H05711 and JSPS KAKENHI(A) 16H01743.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adebayo (2018) Adebayo, J. Fair ML: Auditing black-box predictive models. https://github.com/adebayoj/fairml , 2018.
- 2Angelino et al. (2017) Angelino, E., Larus-Stone, N., Alabi, D., Seltzer, M., and Rudin, C. Learning certifiably optimal rule lists. In Proc. ACM KDD 2017, Halifax, August , pp. 35ā44, 2017.
- 3Avis & Fukuda (1996) Avis, D. and Fukuda, K. Reverse search for enumeration. Discrete Applied Mathematics , 65(1-3):21ā46, 1996.
- 4Bien & Tibshirani (2011) Bien, J. and Tibshirani, R. Prototype selection for interpretable classification. The Annals of Applied Statistics , 5(4):2403ā2424, 2011.
- 5Burges & Crisp (2000) Burges, C. J. and Crisp, D. J. Uniqueness of the svm solution. In NIPS 1999 , pp. 223ā229, 2000.
- 6Calders et al. (2009) Calders, T., Kamiran, F., and Pechenizkiy, M. Building classifiers with independency constraints. In IEEE ICDM 2009 Workshops , pp. 13ā18, Dec 2009.
- 7Cormen et al. (2009) Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms, Third Edition . The MIT Press, 3rd edition, 2009.
- 8Crawford (2017) Crawford, K. The trouble with bias. NIPS 2017, invited talk, Long Beach, USA, 2017.
