Quels corpus d'entra\^inement pour l'expansion de requ\^etes par   plongement de mots : application \`a la recherche de microblogs culturels

Philippe Mulhem; Lorraine Goeuriot; Massih-Reza Amini and; Nayanika Dogra

arXiv:1911.07317·cs.IR·November 19, 2019

Quels corpus d'entra\^inement pour l'expansion de requ\^etes par plongement de mots : application \`a la recherche de microblogs culturels

Philippe Mulhem, Lorraine Goeuriot, Massih-Reza Amini and, Nayanika Dogra

PDF

Open Access

TL;DR

This paper investigates the effectiveness of word embeddings trained on different corpora for query expansion in microblog retrieval, revealing that domain similarity does not always improve retrieval performance.

Contribution

It provides an experimental framework analyzing how training corpus choice affects word embedding quality for microblog search.

Findings

01

Embeddings trained on domain-specific corpora do not always enhance retrieval results.

02

The study highlights the complex relationship between training data and retrieval performance.

03

Results suggest the need for careful selection of training corpora for query expansion tasks.

Abstract

We describe here an experimental framework and the results obtained on microblogs retrieval. We study the contribution one popular approach, i.e., words embeddings, and investigate the impact of the training set on the learned embedding. We focus on query expansion for the retrieval of tweets on the CLEF CMC 2016 corpus. Our results show that using embeddings trained on a corpus in the same domain as the indexed documents did not necessarily lead to better retrieval results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Video Analysis and Summarization