Tagging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge In Long Contexts

Anwesan Pal; Karen Hovsepian; Tinghao Guo; Mengnan Zhao; Somendra Tripathi; Nikos Kanakaris; George Mihaila; Sumit Nigam

arXiv:2510.22956·cs.CL·October 28, 2025

Tagging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge In Long Contexts

Anwesan Pal, Karen Hovsepian, Tinghao Guo, Mengnan Zhao, Somendra Tripathi, Nikos Kanakaris, George Mihaila, Sumit Nigam

PDF

1 Video

TL;DR

This paper introduces Tagging-Augmented Generation (TAG), a lightweight data augmentation method that enhances large language models' ability to handle long contexts in question-answering tasks without complex pre-processing.

Contribution

The authors propose a novel tagging-based augmentation strategy that improves LLM performance on long-context QA benchmarks, avoiding the drawbacks of retrieval and chunking methods.

Findings

01

Up to 17% performance improvement on 32K token contexts

02

2.9% gain in complex multi-hop reasoning questions

03

Effective augmentation without altering document integrity

Abstract

Recent investigations into effective context lengths of modern flagship large language models (LLMs) have revealed major limitations in effective question answering (QA) and reasoning over long and complex contexts for even the largest and most impressive cadre of models. While approaches like retrieval-augmented generation (RAG) and chunk-based re-ranking attempt to mitigate this issue, they are sensitive to chunking, embedding and retrieval strategies and models, and furthermore, rely on extensive pre-processing, knowledge acquisition and indexing steps. In this paper, we propose Tagging-Augmented Generation (TAG), a lightweight data augmentation strategy that boosts LLM performance in long-context scenarios, without degrading and altering the integrity and composition of retrieved documents. We validate our hypothesis by augmenting two challenging and directly relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Tagging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge In Long Contexts· underline