EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference
Aayush Kumar

TL;DR
EdgeJury is a lightweight ensemble framework that enhances truthfulness and robustness of small instruction-tuned language models for serverless edge inference, significantly reducing hallucinations in question answering tasks.
Contribution
The paper introduces EdgeJury, a novel multi-stage ensemble method that improves factual accuracy of small models without external retrieval or large APIs.
Findings
Achieves 76.2% accuracy on TruthfulQA, a 21.4% improvement over single models.
Yields 48.2% relative gains on EdgeCases adversarial set.
Reduces factual hallucinations by approximately 55% compared to single models.
Abstract
Hallucinations hinder reliable question answering, especially in resource-constrained deployments where frontier-scale models or retrieval pipelines may be impractical. We present EdgeJury, a lightweight ensemble framework that improves truthfulness and robustness using only small instruction-tuned language models (3B-8B) suitable for serverless edge inference. EdgeJury orchestrates four stages: (1) parallel role-specialized generation, (2) anonymized cross-review with structured critiques and rankings, (3) chairman synthesis that integrates the strongest content while addressing flagged issues, and (4) claim-level consistency labeling based on inter-model agreement. On TruthfulQA (MC1), EdgeJury achieves 76.2% accuracy (95% CI: 72.8-79.6%), a +21.4% relative improvement over a single 8B baseline (62.8%), and outperforms standard baselines including self-consistency and majority voting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Mobile Crowdsensing and Crowdsourcing
