Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias
Hazel Kim, Philip Torr

TL;DR
This paper introduces MoLaCE, a lightweight framework that mitigates confirmation bias in large language models by dynamically mixing latent concept experts, enhancing robustness and debate capabilities efficiently.
Contribution
MoLaCE is a novel, inference-time method that dynamically reweights latent concepts to reduce confirmation bias in LLMs, applicable to single and multi-agent debate settings.
Findings
Reduces confirmation bias across various prompts.
Improves robustness and factual correctness.
Matches or surpasses multi-agent debate performance with less computation.
Abstract
Large language models (LLMs) are highly vulnerable to input confirmation bias. When a prompt implies a preferred answer, models often reinforce that bias rather than explore alternatives. This phenomenon remains underexplored, yet it is already harmful in base models and poses an even greater risk in multi-agent debate, where echo chambers reinforce bias instead of correction. We introduce Mixture of Latent Concept Experts (MoLaCE), a lightweight inference-time framework that addresses confirmation bias by mixing experts instantiated as different activation strengths over latent concepts that shape model responses. Our key insight is that, due to the compositional nature of language, differently phrased prompts reweight latent concepts in prompt-specific ways that affect factual correctness, so no single fixed intervention can be applied universally across inputs. This design enables a…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
- Confirmation bias in LLMs is a critical practical issue. Understanding the connection between single-agent bias and multi-agent echo chambers provides insight into addressing practical issues. Formulating the problem using latent concepts provides a principled framework. - Training-free intervention using CAA is practical for any LLM. The proposed method reqauires significantly lower computational costs than multi-agent debate and can therefore be integrated into existing systems without addi
- The core components—Contrastive Activation Addition, mixture-of-experts architecture, and the debate framework—are all established techniques. While their combination for bias mitigation is new, the paper does not introduce fundamentally novel methods or theoretical insights beyond applying existing tools in a new context. - As stated in l.311 and l.315, "For TruthfulQA, correctness is automatically judged by both Gemini 2.5 Pro and GPT-5", and "To systematically study confirmation bias, we c
The paper is well-motivated and provide a good operationalization of input confirmation bias to study the internal representation and propose interesting interventions based on the findings. The proposed method makes sense theoretically and seems to work well in practice.
Weaknesses: 1) Do you have any sense of which setting would MoLaCE be most effective? Consider a dataset where the ground-truth distribution is highly skewed (e.g., in a fact-checking task where 99% of statements are true). In such a scenario, a consistently "pro-truth" prompt framing would be a very effective strategy for maximizing accuracy. By steering the model away from this beneficial bias and toward neutrality, MoLaCE could paradoxically decrease performance on this subset of the data. H
- **[Important motivation]** The paper focuses on confirmation bias, which is a key hurtle in the effectiveness of multi-LLM inference (or even single LLM inference over multiple timesteps). - **[Training-free method]** MoLaCE operates entirely at inference time through activation steering; no additional fine-tuning or data are required. Im quite impressed with the way in which the authors construct the steering vector that they use for modifying the LLMs generation during inference time. I ini
Overall, this is a paper I wanted to like! The motivation is strong and the idea is very interesting, but the work feels incomplete. The evaluation is too narrow, the baselines omit the most relevant prior methods, and the theoretical component is a bit muddy. With a more fleshed out presentation of the theoretical foundation and with more complete experiments this could be a very nice paper. - **[Narrow and outdated evaluation]** The study tests only on BoolQ, MMLU, and TruthfulQA (datasets t
* The numbers look very strong; I’m very surprised by the level of improvement obtained * The method itself is interesting/creative, and would be interesting if it worked well. I'd imagine there are a number of extensions if something like this works well, and variations which could work even better * While there aren't many evaluations (3), they are pretty reasonable / cover some breadth of applications (especially MMLU)
* I’d love to see this work on more datasets — I’m not sure if this method is designed around truthfulQA, though the fact that it helps on MMLU is helpful. I’d be interested if it seems like it’s very broadly helping across say 10 datasets, including other high profile ones like GPQA. I’d also be interested to know what the improvements are on specific MMLU subsets, to know where the gains are coming from * It’s unclear to me how the method works, in particular how the earlier work on Debate+ wo
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Forecasting Techniques and Applications · Topic Modeling
