HAGRID: A Human-LLM Collaborative Dataset for Generative   Information-Seeking with Attribution

Ehsan Kamalloo; Aref Jafari; Xinyu Zhang; Nandan Thakur; Jimmy Lin

arXiv:2307.16883·cs.CL·August 1, 2023·27 cites

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin

PDF

Open Access 1 Repo 2 Datasets

TL;DR

HAGRID is a new publicly available dataset designed to facilitate the development of generative information-seeking models that can retrieve relevant quotes and generate explanations with proper attributions, built through human and LLM collaboration.

Contribution

This paper introduces HAGRID, a novel dataset for training and evaluating attribution-capable generative information-seeking models, based on human-LLM collaboration and automatic data collection.

Findings

01

HAGRID enables the development of models with improved attribution capabilities.

02

The dataset includes human-annotated evaluations of LLM-generated explanations.

03

HAGRID is built on the MIRACL dataset, ensuring accessibility and relevance.

Abstract

The rise of large language models (LLMs) had a transformative impact on search, ushering in a new era of search engines that are capable of generating search results in natural language text, imbued with citations for supporting sources. Building generative information-seeking models demands openly accessible datasets, which currently remain lacking. In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations. Unlike recent efforts that focus on human evaluation of black-box proprietary search engines, we built our dataset atop the English subset of MIRACL, a publicly available information retrieval dataset. HAGRID is constructed based on human and LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

project-miracl/hagrid
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Attention Dropout · Residual Connection · Softmax · Cosine Annealing