Distributed Maximization of Submodular plus Diversity Functions for Multi-label Feature Selection on Huge Datasets
Mehrdad Ghadiri, Mark Schmidt

TL;DR
This paper introduces a distributed greedy algorithm for optimizing a combined diversity and submodular function, enabling efficient multi-label feature selection on large-scale datasets with competitive results.
Contribution
It presents the first distributed multi-label feature selection method based on maximizing a sum of diversity and submodular functions, suitable for big data environments.
Findings
The algorithm achieves a constant factor approximation in distributed settings.
The method performs comparably or better than centralized approaches.
It effectively handles large-scale, high-dimensional data.
Abstract
There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of this. In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function. The diversity function addresses the redundancy, and the submodular function controls the predictive quality. We consider the problem in big data settings (in other words, distributed and streaming settings) where the data cannot be stored on a single machine or the process time is too high for a single machine. We show that a greedy algorithm achieves a constant factor approximation of the optimal solution in these settings. Moreover, we formulate the multi-label feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Spam and Phishing Detection · Machine Learning and Algorithms
