MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation
Jyotika Singh, Fang Tu, Miguel Ballesteros, Weiyi Sun, Sandip Ghoshal, Michelle Yuan, Yassine Benajiba, Sujith Ravi, Dan Roth

TL;DR
MT-OSC is a framework that condenses chat history to improve multi-turn conversation performance in LLMs, reducing token usage and maintaining accuracy.
Contribution
It introduces a novel automatic condensation method with a Condenser Agent, significantly reducing token counts and enhancing multi-turn chat efficiency.
Findings
Reduces token counts by up to 72% in 10-turn dialogues.
Consistently improves or maintains accuracy across 13 LLMs and benchmarks.
Robust to distractors and irrelevant turns in multi-turn conversations.
Abstract
Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in the background without disrupting the user experience. MT-OSC employs a Condenser Agent that uses a few-shot inference-based Condenser and a lightweight Decider to selectively retain essential information, reducing token counts by up to 72% in 10-turn dialogues. Evaluated across 13 state-of-the-art LLMs and diverse multi-turn benchmarks, MT-OSC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
