Mitigating Legibility Tax with Decoupled Prover-Verifier Games
Yegon Kim, Juho Lee

TL;DR
This paper introduces a decoupled prover-verifier game framework that improves the checkability of large language model outputs by training a translator to convert solutions into a checkable form, reducing the legibility tax.
Contribution
It proposes a novel decoupled training approach with a translator model to enhance checkability without sacrificing correctness in prover-verifier systems.
Findings
Decoupled training reduces legibility tax.
Translator maintains solver's answer fidelity.
Framework achieves faithful and checkable outputs.
Abstract
As large language models become increasingly capable, it is critical that their outputs can be easily checked by less capable systems. Prover-verifier games can be used to improve checkability of model outputs, but display a degradation in accuracy compared to a baseline trained only to maximize correctness -- a phenonemon named legibility tax. We propose a solution by decoupling the correctness from the checkability condition and instead training a "translator" model that turns a fixed solver model's solution into a checkable form. This allows us to first train the solver to maximize correctness, and then train the translator to translate the solver into a checkable form while retaining the solver's answer. To accommodate this new objective of translation, we formulate a decoupled prover-verifier game where the equilibria correspond to faithful and checkable translators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)
