ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG   Capabilities

Peng Xu; Wei Ping; Xianchao Wu; Chejian Xu; Zihan Liu; Mohammad; Shoeybi; Bryan Catanzaro

arXiv:2407.14482·cs.CL·February 18, 2025

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Peng Xu, Wei Ping, Xianchao Wu, Chejian Xu, Zihan Liu, Mohammad, Shoeybi, Bryan Catanzaro

PDF

Open Access 5 Models 1 Datasets

TL;DR

ChatQA 2 introduces an extended-context Llama 3.0-based model with 128K tokens, significantly improving long context understanding and retrieval-augmented generation, outperforming existing models on ultra-long and RAG tasks.

Contribution

The paper presents a novel training recipe and instruction tuning process to extend Llama 3.0's context window to 128K tokens and enhance RAG capabilities, outperforming state-of-the-art models.

Findings

01

Outperforms GPT-4-Turbo and other models on ultra-long tasks.

02

RAG with larger top-k chunks improves performance over direct long-context methods.

03

Open-sourced model, training data, and evaluation setup.

Abstract

In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128K context window, designed to bridge the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo-2024-04-09) in long context understanding and retrieval-augmented generation (RAG) capabilities. These two capabilities are complementary to each other and essential for LLMs to process large volumes of information that cannot fit into a single prompt. We present a detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model's instruction-following, RAG performance, and long-context understanding capabilities. Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models, including GPT-4-Turbo-2024-04-09, Qwen2-72B-Instruct, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

nvidia/ChatQA2-Long-SFT-data
dataset· 464 dl
464 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · LLaMA · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Byte Pair Encoding · Layer Normalization · Linear Layer