LSHTC: A Benchmark for Large-Scale Text Classification

Ioannis Partalas; Aris Kosmopoulos; Nicolas Baskiotis; Thierry; Artieres; George Paliouras; Eric Gaussier; Ion Androutsopoulos; Massih-Reza; Amini; Patrick Galinari

arXiv:1503.08581·cs.IR·March 31, 2015·137 cites

LSHTC: A Benchmark for Large-Scale Text Classification

Ioannis Partalas, Aris Kosmopoulos, Nicolas Baskiotis, Thierry, Artieres, George Paliouras, Eric Gaussier, Ion Androutsopoulos, Massih-Reza, Amini, Patrick Galinari

PDF

Open Access

TL;DR

The paper introduces LSHTC, a comprehensive benchmark for evaluating large-scale text classification systems across hundreds of thousands of classes, including dataset details, challenge design, and evaluation metrics.

Contribution

It provides publicly available datasets and a structured challenge framework for large-scale text classification research.

Findings

01

Datasets are publicly accessible online.

02

Evaluation measures are established for large-scale classification.

03

Initial challenge results are summarized.

Abstract

LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands). This paper describes the dataset that have been released along the LSHTC series. The paper details the construction of the datsets and the design of the tracks as well as the evaluation measures that we implemented and a quick overview of the results. All of these datasets are available online and runs may still be submitted on the online server of the challenges.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Spam and Phishing Detection · Algorithms and Data Compression