Learning to Correct for QA Reasoning with Black-box LLMs
Jaehyung Kim, Dongyoung Kim, Yiming Yang

TL;DR
This paper introduces CoBB, a novel method that enhances the reasoning capabilities of black-box large language models in question-answering tasks by using a trained adaptation model to correct their reasoning without requiring access to internal probabilities.
Contribution
The paper proposes CoBB, a new approach that uses a small open-source LLM and a genetic algorithm to select training pairs, improving reasoning accuracy of black-box LLMs without increased inference costs.
Findings
CoBB significantly improves reasoning accuracy across multiple QA benchmarks.
The adaptation model effectively corrects imperfect reasonings of black-box LLMs.
The dataset construction via genetic algorithm optimizes training pair selection.
Abstract
An open challenge in recent machine learning is about how to improve the reasoning capability of large language models (LLMs) in a black-box setting, i.e., without access to detailed information such as output token probabilities. Existing approaches either rely on accessibility (which is often unrealistic) or involve significantly increased train- and inference-time costs. This paper addresses those limitations or shortcomings by proposing a novel approach, namely CoBB (Correct for improving QA reasoning of Black-Box LLMs). It uses a trained adaptation model to perform a seq2seq mapping from the often-imperfect reasonings of the original black-box LLM to the correct or improved reasonings. Specifically, the adaptation model is initialized with a relatively small open-source LLM and adapted over a collection of sub-sampled training pairs. To select the representative pairs of correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Data Quality and Management
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
