Embedding Meta-Textual Information for Improved Learning to Rank

Toshitaka Kuwa; Shigehiko Schamoni; Stefan Riezler

arXiv:2010.16313·cs.IR·February 3, 2021

Embedding Meta-Textual Information for Improved Learning to Rank

Toshitaka Kuwa, Shigehiko Schamoni, Stefan Riezler

PDF

TL;DR

This paper introduces a framework for embedding meta-textual information alongside textual data to enhance learning to rank in information retrieval, demonstrating significant improvements in cross-lingual and patent retrieval tasks.

Contribution

It extends neural embedding methods to include meta-textual categories, optimizing pairwise ranking for better IR performance, which was not previously explored.

Findings

01

Significant gains in cross-lingual Wikipedia retrieval.

02

Improved patent retrieval performance.

03

Mode of combining information is crucial for success.

Abstract

Neural approaches to learning term embeddings have led to improved computation of similarity and ranking in information retrieval (IR). So far neural representation learning has not been extended to meta-textual information that is readily available for many IR tasks, for example, patent classes in prior-art retrieval, topical information in Wikipedia articles, or product categories in e-commerce data. We present a framework that learns embeddings for meta-textual categories, and optimizes a pairwise ranking objective for improved matching based on combined embeddings of textual and meta-textual information. We show considerable gains in an experimental evaluation on cross-lingual retrieval in the Wikipedia domain for three language pairs, and in the Patent domain for one language pair. Our results emphasize that the mode of combining different types of information is crucial for model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.