WTR: A Test Collection for Web Table Retrieval

Zhiyu Chen; Shuo Zhang; Brian D. Davison

arXiv:2105.02354·cs.IR·May 7, 2021

WTR: A Test Collection for Web Table Retrieval

Zhiyu Chen, Shuo Zhang, Brian D. Davison

PDF

1 Repo

TL;DR

This paper introduces a new large-scale Web table retrieval test collection from Common Crawl, including context relevance judgments, to advance research in web table retrieval methods.

Contribution

It provides a comprehensive dataset with context-aware relevance judgments and baseline results, enabling improved evaluation and development of web table retrieval techniques.

Findings

01

Context labels improve retrieval performance

02

Baseline methods show varying effectiveness with context information

03

The dataset facilitates future research in web table retrieval

Abstract

We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhiyu-Chen/Web-Table-Retrieval-Benchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.