Quels corpus d'entra\^inement pour l'expansion de requ\^etes par plongement de mots : application \`a la recherche de microblogs culturels
Philippe Mulhem, Lorraine Goeuriot, Massih-Reza Amini and, Nayanika Dogra

TL;DR
This paper investigates the effectiveness of word embeddings trained on different corpora for query expansion in microblog retrieval, revealing that domain similarity does not always improve retrieval performance.
Contribution
It provides an experimental framework analyzing how training corpus choice affects word embedding quality for microblog search.
Findings
Embeddings trained on domain-specific corpora do not always enhance retrieval results.
The study highlights the complex relationship between training data and retrieval performance.
Results suggest the need for careful selection of training corpora for query expansion tasks.
Abstract
We describe here an experimental framework and the results obtained on microblogs retrieval. We study the contribution one popular approach, i.e., words embeddings, and investigate the impact of the training set on the learned embedding. We focus on query expansion for the retrieval of tweets on the CLEF CMC 2016 corpus. Our results show that using embeddings trained on a corpus in the same domain as the indexed documents did not necessarily lead to better retrieval results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Video Analysis and Summarization
