Ensemble Debates with Local Large Language Models for AI Alignment
Ephraiem Sarabamoun

TL;DR
This paper explores the use of local open-source ensemble debates to enhance AI alignment reasoning in large language models, demonstrating significant improvements over single-model approaches across multiple scenarios.
Contribution
It introduces a novel ensemble debate framework using open-source models, showing improved alignment-related reasoning and providing reproducible code and datasets.
Findings
Ensembles outperform single models on a 7-point rubric.
Largest gains in reasoning depth and argument quality.
Significant improvements in truthfulness and human enhancement.
Abstract
As large language models (LLMs) take on greater roles in high-stakes decisions, alignment with human values is essential. Reliance on proprietary APIs limits reproducibility and broad participation. We study whether local open-source ensemble debates can improve alignmentoriented reasoning. Across 150 debates spanning 15 scenarios and five ensemble configurations, ensembles outperform single-model baselines on a 7-point rubric (overall: 3.48 vs. 3.13), with the largest gains in reasoning depth (+19.4%) and argument quality (+34.1%). Improvements are strongest for truthfulness (+1.25 points) and human enhancement (+0.80). We provide code, prompts, and a debate data set, providing an accessible and reproducible foundation for ensemble-based alignment evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Scientific Computing and Data Management · Artificial Intelligence in Healthcare and Education
