UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Prateek Chanda; Prayas Agrawal; Karthik S. Gurumoorthy; Ganesh Ramakrishnan; Bamdev Mishra; Pratik Jawanpuria

arXiv:2604.10952·cs.LG·April 14, 2026

UniPROT: Uniform Prototype Selection via Partial Optimal Transport with Submodular Guarantees

Prateek Chanda, Prayas Agrawal, Karthik S. Gurumoorthy, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria

PDF

1 Repo

TL;DR

UniPROT introduces a theoretically grounded, scalable method for selecting uniformly weighted prototypes using partial optimal transport, improving minority class representation in imbalanced classification tasks.

Contribution

It reformulates optimal transport constraints into a submodular objective, enabling efficient greedy algorithms with guarantees, and demonstrates improved minority class performance.

Findings

01

Enforces uniform prototype weights improves minority-class representation.

02

Achieves robust performance gains in language models under domain imbalance.

03

Provides a scalable, theoretically justified prototype selection method.

Abstract

Selecting prototypical examples from a source distribution to represent a target data distribution is a fundamental problem in machine learning. Existing subset selection methods often rely on implicit importance scores, which can be skewed towards majority classes and lead to low-quality prototypes for minority classes. We present $\methodprop$ , a novel subset selection framework that minimizes the optimal transport (OT) distance between a uniformly weighted prototypical distribution and the target distribution. While intuitive, this formulation leads to a cardinality-constrained maximization of a \emph{super-additive} objective, which is generally intractable to approximate efficiently. To address this, we propose a principled reformulation of the OT marginal constraints, yielding a partial optimal transport-based submodular objective. We prove that this reformulation enables a greedy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

efficiency-learning/UniPROT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.