HLTCOE at LiveRAG: GPT-Researcher using ColBERT retrieval
Kevin Duh, Eugene Yang, Orion Weller, Andrew Yates, Dawn Lawrie

TL;DR
This paper presents a GPT-based research system that integrates ColBERT retrieval, multilingual query generation, and filtering to answer questions, achieving competitive performance in the LiveRAG evaluation.
Contribution
It introduces a novel combination of dense retrieval, multilingual query generation, and filtering within a GPT-researcher framework for improved question answering.
Findings
Placed 5th in LiveRAG evaluation with a score of 1.07.
Utilized a ColBERT bi-encoder with a fine-tuned multilingual retrieval model.
Integrated multiple models for query generation, filtering, and answer synthesis.
Abstract
The HLTCOE LiveRAG submission utilized the GPT-researcher framework for researching the context of the question, filtering the returned results, and generating the final answer. The retrieval system was a ColBERT bi-encoder architecture, which represents a passage with many dense tokens. Retrieval used a local, compressed index of the FineWeb10-BT collection created with PLAID-X, using a model fine-tuned for multilingual retrieval. Query generation from context was done with Qwen2.5-7B-Instruct, while filtering was accomplished with m2-bert-80M-8k-retrieval. Up to nine passages were used as context to generate an answer using Falcon3-10B. This system placed 5th in the LiveRAG automatic evaluation for correctness with a score of 1.07.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research and Technology · Artificial Intelligence in Healthcare and Education · Mathematics, Computing, and Information Processing
