EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference

Aayush Kumar

arXiv:2601.00850·cs.LG·January 6, 2026

EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference

Aayush Kumar

PDF

Open Access

TL;DR

EdgeJury is a lightweight ensemble framework that enhances truthfulness and robustness of small instruction-tuned language models for serverless edge inference, significantly reducing hallucinations in question answering tasks.

Contribution

The paper introduces EdgeJury, a novel multi-stage ensemble method that improves factual accuracy of small models without external retrieval or large APIs.

Findings

01

Achieves 76.2% accuracy on TruthfulQA, a 21.4% improvement over single models.

02

Yields 48.2% relative gains on EdgeCases adversarial set.

03

Reduces factual hallucinations by approximately 55% compared to single models.

Abstract

Hallucinations hinder reliable question answering, especially in resource-constrained deployments where frontier-scale models or retrieval pipelines may be impractical. We present EdgeJury, a lightweight ensemble framework that improves truthfulness and robustness using only small instruction-tuned language models (3B-8B) suitable for serverless edge inference. EdgeJury orchestrates four stages: (1) parallel role-specialized generation, (2) anonymized cross-review with structured critiques and rankings, (3) chairman synthesis that integrates the strongest content while addressing flagged issues, and (4) claim-level consistency labeling based on inter-model agreement. On TruthfulQA (MC1), EdgeJury achieves 76.2% accuracy (95% CI: 72.8-79.6%), a +21.4% relative improvement over a single 8B baseline (62.8%), and outperforms standard baselines including self-consistency and majority voting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Mobile Crowdsensing and Crowdsourcing