RTTC: Reward-Guided Collaborative Test-Time Compute
J. Pablo Mu\~noz, Jinjie Yuan

TL;DR
RTTC introduces an adaptive framework that intelligently selects the optimal test-time compute strategy for large language models, significantly improving accuracy while reducing unnecessary computation across diverse tasks.
Contribution
This work presents RTTC, a reward-guided, adaptive approach for selecting test-time compute strategies, incorporating distributed retrieval, lightweight fine-tuning, and query-state caching.
Findings
RTTC outperforms vanilla RAG and TTT in accuracy across multiple benchmarks.
Adaptive strategy selection reduces computational overhead.
Query-State Caching improves efficiency by reusing historical query states.
Abstract
Test-Time Compute (TTC) has emerged as a powerful paradigm for enhancing the performance of Large Language Models (LLMs) at inference, leveraging strategies such as Test-Time Training (TTT) and Retrieval-Augmented Generation (RAG). However, the optimal adaptation strategy varies across queries, and indiscriminate application of TTC strategy incurs substantial computational overhead. In this work, we introduce Reward-Guided Test-Time Compute (RTTC), a novel framework that adaptively selects the most effective TTC strategy for each query via a pretrained reward model, maximizing downstream accuracy across diverse domains and tasks. RTTC operates in a distributed server-client architecture, retrieving relevant samples from a remote knowledge base and applying RAG or lightweight fine-tuning on client devices only when necessary. To further mitigate redundant computation, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
