News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News

Tarun Jain; Yufei Gao; Sridhar Vanga; Karan Singla

arXiv:2410.07520·cs.CL·November 7, 2024

News Reporter: A Multi-lingual LLM Framework for Broadcast T.V News

Tarun Jain, Yufei Gao, Sridhar Vanga, Karan Singla

PDF

Open Access 6 Models

TL;DR

This paper introduces a multi-lingual LLM framework tailored for broadcast TV news, leveraging a new dataset of QA pairs from news transcripts and a retrieval-augmented generation approach to enhance answer accuracy and verifiability.

Contribution

It presents a novel dataset of news transcript QA pairs and a fine-tuned LLM model with a retrieval-augmented method for improved news answer accuracy.

Findings

01

Model surpasses similar-sized base models on open benchmarks.

02

QA dataset improves LLM training for news applications.

03

RAG method enhances answer contextualization and verification.

Abstract

Large Language Models (LLMs) have fast become an essential tools to many conversational chatbots due to their ability to provide coherent answers for varied queries. Datasets used to train these LLMs are often a mix of generic and synthetic samples, thus lacking the verification needed to provide correct and verifiable answers for T.V. News. We collect and share a large collection of QA pairs extracted from transcripts of news recordings from various news-channels across the United States. Resultant QA pairs are then used to fine-tune an off-the-shelf LLM model. Our model surpasses base models of similar size on several open LLM benchmarks. We further integrate and propose a RAG method to improve contextualization of our answers and also point it to a verifiable news recording.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPower Systems and Technologies · Digital Rights Management and Security · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Attention Dropout · Attention Is All You Need · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Dropout · Byte Pair Encoding · BERT