Training Language Models to Generate Text with Citations via   Fine-grained Rewards

Chengyu Huang; Zeqiu Wu; Yushi Hu; Wenya Wang

arXiv:2402.04315·cs.CL·September 4, 2024·1 cites

Training Language Models to Generate Text with Citations via Fine-grained Rewards

Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces a training framework with fine-grained rewards to improve LLMs in generating accurate, citation-supported responses, reducing hallucinations and enhancing credibility, especially for smaller models.

Contribution

It presents a novel reward-based training method that significantly improves citation generation and response correctness in LLMs, outperforming traditional training strategies.

Findings

01

Fine-grained rewards enhance citation relevance and support in LLMs.

02

The method outperforms baseline models, including GPT-3.5-turbo.

03

Improved performance demonstrated on ALCE and EXPERTQA datasets.

Abstract

While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hcy123902/atg-w-fg-rw
pytorchOfficial

Datasets

kozi2/ax_dataset
dataset· 6 dl
6 dl

Videos

Training Language Models to Generate Text with Citations via Fine-grained Rewards· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Attention Dropout · Residual Connection