A Modified Construction for a Support Vector Classifier to Accommodate Class Imbalances
Matt Parker, Colin Parker

TL;DR
This paper proposes a modified support vector classifier that adjusts margins based on class variance to better handle imbalanced data, improving classification accuracy.
Contribution
It introduces a novel SVM formulation with class-specific margins proportional to class standard deviations, enhancing performance on imbalanced datasets.
Findings
Improved classification accuracy on imbalanced datasets
The modified SVM reduces bias towards the majority class
The approach generalizes standard SVM when class variances are equal
Abstract
Given a training set with binary classification, the Support Vector Machine identifies the hyperplane maximizing the margin between the two classes of training data. This general formulation is useful in that it can be applied without regard to variance differences between the classes. Ignoring these differences is not optimal, however, as the general SVM will give the class with lower variance an unjustifiably wide berth. This increases the chance of misclassification of the other class and results in an overall loss of predictive performance. An alternate construction is proposed in which the margins of the separating hyperplane are different for each class, each proportional to the standard deviation of its class along the direction perpendicular to the hyperplane. The construction agrees with the SVM in the case of equal class variances. This paper will then examine the impact to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Face and Expression Recognition · Advanced Statistical Methods and Models
MethodsSupport Vector Machine
A Modified Construction for a Support Vector Machine to Accommodate Class Imbalances
Matt Parker, Colin Parker
Abstract
Given a training set with binary classification, the Support Vector Machine identifies the hyperplane maximizing the margin between the two classes of training data. This general formulation is useful in that it can be applied without regard to variance differences between the classes. Ignoring these differences is not optimal, however, as the general SVM will give the class with lower variance an unjustifiably wide berth. This increases the chance of misclassification of the other class and results in an overall loss of predictive performance. An alternate construction is proposed in which the margins of the separating hyperplane are different for each class, each proportional to the standard deviation of its class along the direction perpendicular to the hyperplane. The construction agrees with the SVM in the case of equal class variances. This paper will then examine the impact to the dual representation of the modified constraint equations.
1 A Recap: The Classical SVM Construction
For Section 1, we follow the construction given by Hastie, Tibshirani, and Freidman in The Elements of Statistical Learning [3]. We will parallel this approach in Section 2 when constructing the alternate method.
Suppose we have training data consisting of pairs of observations and labels, , for with and . We may define a hyperplane by:
[TABLE]
where is a vector perpendicular to the hyperplane. An associated classification rule is induced by:
[TABLE]
The goal of finding a separating hyperplane which maximizes the margin for a linearly separable dataset, the minimum perpendicular distance to a datapoint of either class, can be formalized as:
[TABLE]
This can be more conveniently rephrased by removing the requirement be a unit vector, and setting :
[TABLE]
Now define slack variables by
[TABLE]
This gives us a framework to relax the assumption of linear separability. Noting that misclassifications occur when , we see the slack variables are the proportion of the margin by which various points fall within their respective margins. We may control the amount of slack by imposing the additional condition:
[TABLE]
for some constant. This is computationally equivalent to the following expression:
[TABLE]
where the parameter replaces the constant in the previous expression. The corresponding Lagrange primal function is given by:
[TABLE]
which is to be minimized with respect to , and . Setting the respective derivatives equal to zero, we get the equations:
[TABLE]
and positivity constraints . By substituting the above three equations into the Lagrangian dual we obtain the Wolfe dual, given by:
[TABLE]
In addition, the Karush-Kuhn-Tucker conditions yield:
[TABLE]
for . These equations collectively uniquely define the solution to the dual problem.
2 A Modified Approach: Accommodating Difference in Class Variance
The original construction of the SVM for linearly separable data has the goal of maximizing the margin . In the event of a noticeable difference between class variances in the direction of (perpendicular to our separating hyperplane), the SVM ends up positioning the decision boundary closer to the class with larger variance [say, class A] than would be optimal. The new construction accommodates these class imbalances by increasing the margin of the class of greater variance.
It will be useful at this point to define a few terms. For class , element , and separating hyperplane , define to be the standard deviation of elements of class in the direction of :
[TABLE]
and, for class and arbitrary hyperplane , define the margin of class to be:
[TABLE]
We will now seek to find the separating hyperplane which maximizes , the minimum margin over all classes. As an aside, a byproduct of the classic construction of the SVM yields the equality when separating classes and , since the maximum margin is obtained when the separating hyperplane is midway between both classes. Our new construction will yield as a byproduct the equality:
[TABLE]
This shows that in the event our classes have equal variance in the direction of , the modified construction coincides with the classical SVM.
3 Examining Implications to Dual Representation
Maximizing modifies the optimization problem to the pair of equations:
[TABLE]
Slightly redefining slack variables according to the fraction of the respective margins they span yields:
[TABLE]
and the corresponding modified SVM equations are given by:
[TABLE]
We can now formulate the corresponding Lagrangian (primal) function as:
[TABLE]
which we again minimize with respect to , and . Setting derivatives with respect to and equal to zero, we get similar results:
[TABLE]
and a slightly more complex equation when doing the same with respect to :
[TABLE]
Expanding to its representation in (21), we may utilize the Hadamard product notation and the fact
[TABLE]
where is the Hadamard product, to obtain:
[TABLE]
where is the vector of ones [1, … , 1].
This gives us a working representation of the equivalent dual optimization equations under the new construction, and a forthcoming paper will be examining the solvability of the above in general in light of the other constraint equations, as well as consequent impacts to kernelizability of the method. We will also examine in depth the circumstances in which our alternate construction outperforms a traditional Support Vector Classifier, and attempt to quantify them.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Trevor Hastie, Robert Tibshirani, and Jerome Freidman. The Elements of Statistical Learning . Springer-Verlag, New York, New York, 2009.
- 2[2] Andrew Ng. CS 229 Lecture Notes . [ http://cs 229.stanford.edu/notes/cs 229-notes 3.pdf ]
- 3[3] Robert Gunn, Support Vector Machines for Classification and Regression . Technical Report for University of Southampton, Southampton, England, 1998.
