Introducing LETOR 4.0 Datasets

Tao Qin; Tie-Yan Liu

arXiv:1306.2597·cs.IR·June 12, 2013·199 cites

Introducing LETOR 4.0 Datasets

Tao Qin, Tie-Yan Liu

PDF

Open Access 3 Repos

TL;DR

LETOR 4.0 introduces a new, comprehensive benchmark dataset for learning to rank research, utilizing large web page collections and query sets to facilitate evaluation and comparison of ranking algorithms.

Contribution

This paper presents LETOR 4.0, a significantly updated benchmark dataset for learning to rank, based on large web collections and new query sets, enhancing research tools for the community.

Findings

01

New dataset based on Gov2 web pages and TREC query sets

02

Provides standard features, relevance judgments, and evaluation tools

03

Facilitates reliable benchmarking for learning to rank algorithms

Abstract

LETOR is a package of benchmark data sets for research on LEarning TO Rank, which contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008. This version, 4.0, was released in July 2009. Very different from previous versions (V3.0 is an update based on V2.0 and V2.0 is an update based on V1.0), LETOR4.0 is a totally new release. It uses the Gov2 web page collection (~25M pages) and two query sets from Million Query track of TREC 2007 and TREC 2008. We call the two query sets MQ2007 and MQ2008 for short. There are about 1700 queries in MQ2007 with labeled documents and about 800 queries in MQ2008 with labeled documents. If you have any questions or suggestions about the datasets, please kindly email us…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques