ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Peng Xu, Wei Ping, Xianchao Wu, Chejian Xu, Zihan Liu, Mohammad, Shoeybi, Bryan Catanzaro

TL;DR
ChatQA 2 introduces an extended-context Llama 3.0-based model with 128K tokens, significantly improving long context understanding and retrieval-augmented generation, outperforming existing models on ultra-long and RAG tasks.
Contribution
The paper presents a novel training recipe and instruction tuning process to extend Llama 3.0's context window to 128K tokens and enhance RAG capabilities, outperforming state-of-the-art models.
Findings
Outperforms GPT-4-Turbo and other models on ultra-long tasks.
RAG with larger top-k chunks improves performance over direct long-context methods.
Open-sourced model, training data, and evaluation setup.
Abstract
In this work, we introduce ChatQA 2, an Llama 3.0-based model with a 128K context window, designed to bridge the gap between open-source LLMs and leading proprietary models (e.g., GPT-4-Turbo-2024-04-09) in long context understanding and retrieval-augmented generation (RAG) capabilities. These two capabilities are complementary to each other and essential for LLMs to process large volumes of information that cannot fit into a single prompt. We present a detailed continued training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens, along with a three-stage instruction tuning process to enhance the model's instruction-following, RAG performance, and long-context understanding capabilities. Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models, including GPT-4-Turbo-2024-04-09, Qwen2-72B-Instruct, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/Llama3-ChatQA-2-70Bmodel· 42 dl· ♡ 1442 dl♡ 14
- 🤗nvidia/Llama3-ChatQA-2-8Bmodel· 197 dl· ♡ 17197 dl♡ 17
- 🤗KnutJaegersberg/Llama3-ChatQA-2-70B-4.0bpw-exl2model· 2 dl2 dl
- 🤗QuantFactory/Llama3-ChatQA-2-8B-GGUFmodel· 113 dl· ♡ 3113 dl♡ 3
- 🤗KnutJaegersberg/Llama3-ChatQA-2-70B-4.65bpw-exl2model· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · LLaMA · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Byte Pair Encoding · Layer Normalization · Linear Layer
