PECOS: Prediction for Enormous and Correlated Output Spaces
Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang and, Inderjit S. Dhillon

TL;DR
PECOS is a modular machine learning framework designed to efficiently predict and rank relevant items from enormous, correlated output spaces, significantly improving real-time performance in large-scale applications.
Contribution
The paper introduces PECOS, a three-phase, versatile framework that organizes, narrows, and ranks large output spaces, enabling fast, real-time predictions for enormous, correlated label sets.
Findings
Achieves inference in less than 1 millisecond for 2.8 million labels.
Effectively handles long-tail items with limited training data.
Demonstrates versatility with plug-and-play components for different phases.
Abstract
Many large-scale applications amount to finding relevant results from an enormous output space of potential candidates. For example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can range from millions to billions, and can even be infinite in some applications. Moreover, training data is often limited for the long-tail items in the output space. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
