# Enriching Existing Test Collections with OXPath

**Authors:** Philipp Schaer, Mandy Neumann

arXiv: 1706.06836 · 2017-09-13

## TL;DR

This paper introduces a lightweight method using OXPath for efficiently enriching test collections with web data, demonstrated on GIRT4-XT, facilitating easier extension and creation of test collections for information retrieval evaluation.

## Contribution

The paper presents a novel, simple approach employing OXPath to harvest web data for enriching test collections, reducing technical barriers compared to traditional methods.

## Key findings

- Successfully extended GIRT4 with additional metadata fields
- Method applicable to various scenarios for creating or expanding test collections
- Enables reuse of collections for diverse evaluation purposes

## Abstract

Extending TREC-style test collections by incorporating external resources is a time consuming and challenging task. Making use of freely available web data requires technical skills to work with APIs or to create a web scraping program specifically tailored to the task at hand. We present a light-weight alternative that employs the web data extraction language OXPath to harvest data to be added to an existing test collection from web resources. We demonstrate this by creating an extended version of GIRT4 called GIRT4-XT with additional metadata fields harvested via OXPath from the social sciences portal Sowiport. This allows the re-use of this collection for other evaluation purposes like bibliometrics-enhanced retrieval. The demonstrated method can be applied to a variety of similar scenarios and is not limited to extending existing collections but can also be used to create completely new ones with little effort.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.06836/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/1706.06836/full.md

---
Source: https://tomesphere.com/paper/1706.06836