Enhancing Answer Reliability Through Inter-Model Consensus of Large   Language Models

Alireza Amiri-Margavi; Iman Jebellat; Ehsan Jebellat; Seyed Pouyan; Mousavi Davoudi

arXiv:2411.16797·cs.CL·February 25, 2025

Enhancing Answer Reliability Through Inter-Model Consensus of Large Language Models

Alireza Amiri-Margavi, Iman Jebellat, Ehsan Jebellat, Seyed Pouyan, Mousavi Davoudi

PDF

Open Access

TL;DR

This paper introduces a collaborative framework where multiple large language models generate and answer complex questions, demonstrating that inter-model consensus improves response reliability and question quality.

Contribution

The study presents a novel multi-model collaboration approach that enhances answer reliability and assesses question quality using statistical agreement measures.

Findings

01

Claude and GPT-4 produce high-quality, less ambiguous questions.

02

Inter-model consensus correlates with increased response reliability.

03

Gemini and LLaMA show greater variability and lower reliability.

Abstract

We propose a collaborative framework in which multiple large language models -- including GPT-4-0125-preview, Meta-LLaMA-3-70B-Instruct, Claude-3-Opus, and Gemini-1.5-Flash -- generate and answer complex, PhD-level statistical questions when definitive ground truth is unavailable. Our study examines how inter-model consensus improves both response reliability and identifies the quality of the generated questions. Employing chi-square tests, Fleiss' Kappa, and confidence interval analysis, we quantify consensus rates and inter-rater agreement to assess both response precision and question quality. Key results indicate that Claude and GPT-4 produce well-structured, less ambiguous questions with a higher inter-rater agreement, as shown by narrower confidence intervals and greater alignment with question-generating models. In contrast, Gemini and LLaMA exhibit greater variability and lower…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsDense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax · Attention Is All You Need