# Linear classifier design under heteroscedasticity in Linear Discriminant   Analysis

**Authors:** Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt, Elena Gaura

arXiv: 1703.08434 · 2017-03-27

## TL;DR

This paper introduces the Gaussian Linear Discriminant (GLD), a new linear classifier designed for heteroscedastic data that minimizes Bayes error and is computationally efficient, outperforming traditional LDA and comparable classifiers in accuracy and speed.

## Contribution

The paper derives the GLD for heteroscedastic data and proposes a local neighbourhood search algorithm to enhance robustness, significantly reducing training time compared to existing methods.

## Key findings

- GLD outperforms traditional LDA in heteroscedastic scenarios.
- GLD and LNS require up to 150 times less training time than SVM.
- Proposed classifiers achieve comparable or better accuracy across diverse datasets.

## Abstract

Under normality and homoscedasticity assumptions, Linear Discriminant Analysis (LDA) is known to be optimal in terms of minimising the Bayes error for binary classification. In the heteroscedastic case, LDA is not guaranteed to minimise this error. Assuming heteroscedasticity, we derive a linear classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the Bayes error for binary classification. In addition, we also propose a local neighbourhood search (LNS) algorithm to obtain a more robust classifier if the data is known to have a non-normal distribution. We evaluate the proposed classifiers on two artificial and ten real-world datasets that cut across a wide range of application areas including handwriting recognition, medical diagnosis and remote sensing, and then compare our algorithm against existing LDA approaches and other linear classifiers. The GLD is shown to outperform the original LDA procedure in terms of the classification accuracy under heteroscedasticity. While it compares favourably with other existing heteroscedastic LDA approaches, the GLD requires as much as 60 times lower training time on some datasets. Our comparison with the support vector machine (SVM) also shows that, the GLD, together with the LNS, requires as much as 150 times lower training time to achieve an equivalent classification accuracy on some of the datasets. Thus, our algorithms can provide a cheap and reliable option for classification in a lot of expert systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.08434/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1703.08434/full.md

---
Source: https://tomesphere.com/paper/1703.08434