Improving Weak-to-Strong Generalization with Reliability-Aware Alignment

Yue Guo; Yi Yang

arXiv:2406.19032·cs.CL·June 28, 2024

Improving Weak-to-Strong Generalization with Reliability-Aware Alignment

Yue Guo, Yi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reliability-aware alignment method for large language models that improves their ability to generalize from imperfect supervision signals by estimating and utilizing the reliability of weak labels.

Contribution

The paper proposes a novel approach that incorporates answer reliability estimation into the alignment process to enhance weak-to-strong generalization in LLMs.

Findings

01

Effective identification of weak label quality

02

Significant improvement in generalization performance

03

Enhanced robustness to noisy supervision

Abstract

Large language models (LLMs) are now rapidly advancing and surpassing human abilities on many natural language tasks. However, aligning these super-human LLMs with human knowledge remains challenging because the supervision signals from human annotators may be wrong. This issue, known as the "super-alignment" problem, requires enhancing weak-to-strong generalization, where a strong LLM must generalize from imperfect supervision provided by a weaker source. To address this issue, we propose an approach to improve weak-to-strong generalization by involving the reliability of weak supervision signals in the alignment process. In our method, we query the weak supervisor for multiple answers, estimate the answer reliability, and enhance the alignment process by filtering out uncertain data or re-weighting reliable data. Experiments on four datasets demonstrate that our methods effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

irenehere/reliablealignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems · Anomaly Detection Techniques and Applications