Fast semi-supervised discriminant analysis for binary classification of large data-sets
Joris Tavernier, Jaak Simm, Karl Meerbergen, Joerg Kurt Wegner, Hugo, Ceulemans, Yves Moreau

TL;DR
This paper introduces three scalable Krylov subspace-based algorithms for semi-supervised discriminant analysis, significantly reducing computation time while maintaining good predictive performance on large, high-dimensional datasets.
Contribution
The paper presents novel scalable algorithms for semi-supervised discriminant analysis that leverage Krylov subspace methods and data centralization, improving efficiency for large datasets.
Findings
Achieves good predictive performance on industry-scale pharmaceutical data
Methods require only a few seconds to compute, outperforming previous approaches
Effectively exploits data sparsity and shift-invariance of Krylov subspaces
Abstract
High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
