Overview of the TREC 2023 deep learning track
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Hossein A. Rahmani, Daniel Campos, Jimmy Lin, Ellen M. Voorhees, Ian Soboroff

TL;DR
The paper reviews the TREC 2023 deep learning track, highlighting the use of large language models for ranking tasks, the creation of synthetic queries, and the performance of prompt-based methods surpassing previous approaches.
Contribution
It introduces the use of LLM prompting in ranking tasks, compares synthetic and human queries, and demonstrates the effectiveness of prompt-based methods in the TREC 2023 track.
Findings
LLM prompting outperformed nnlm approaches.
Synthetic queries yielded similar system rankings as human queries.
No clear bias observed between GPT-4 and T5 evaluations.
Abstract
This is the fifth year of the TREC Deep Learning track. As in previous years, we leverage the MS MARCO datasets that made hundreds of thousands of human-annotated training labels available for both passage and document ranking tasks. We mostly repeated last year's design, to get another matching test set, based on the larger, cleaner, less-biased v2 passage and document set, with passage ranking as primary and document ranking as a secondary task (using labels inferred from passage). As we did last year, we sample from MS MARCO queries that were completely held out, unused in corpus construction, unlike the test queries in the first three years. This approach yields a more difficult test with more headroom for improvement. Alongside the usual MS MARCO (human) queries from MS MARCO, this year we generated synthetic queries using a fine-tuned T5 model and using a GPT-4 prompt. The new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
