Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus
Dipkumar Patel

TL;DR
This paper investigates the phenomenon of representational collapse in multi-agent LLM committees, proposing a diversity-aware consensus method that improves accuracy and reduces costs by measuring and leveraging embedding diversity.
Contribution
It introduces DALC, a training-free protocol that uses embedding geometry to compute diversity weights, mitigating collapse and enhancing consensus in multi-agent LLM systems.
Findings
DALC achieves 87% accuracy on GSM8K, outperforming self-consistency at 84%.
Representational collapse is measurable and worsens on harder tasks.
Embedding choice significantly affects collapse severity and downstream accuracy.
Abstract
Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across 100 GSM8K questions with three Qwen2.5-14B agents, mean cosine similarity is 0.888 and effective rank is 2.17 out of 3.0, a failure mode we term representational collapse. DALC, a training-free consensus protocol that computes diversity weights from embedding geometry, reaches 87% on GSM8K versus 84% for self-consistency at 26% lower token cost. Ablation experiments reveal 1-3 point per-protocol run-to-run variance, confirm that hint sharing contributes more than diversity weighting alone, and show that encoder choice strongly modulates collapse severity (cosine 0.908 with mxbai versus 0.888 with nomic) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
