An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

Sana Ebrahimi; Mohsen Dehghankar; Abolfazl Asudeh

arXiv:2505.24239·cs.MA·June 2, 2025

An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

Sana Ebrahimi, Mohsen Dehghankar, Abolfazl Asudeh

PDF

Open Access

TL;DR

This paper introduces a credibility scoring framework for multi-agent LLM systems that enhances resistance to adversarial agents by iteratively learning and applying credibility scores during collaborative tasks.

Contribution

It proposes a novel credibility scoring method that improves the robustness of multi-agent LLM systems against adversarial agents, with a game-based modeling approach.

Findings

01

Effective in mitigating adversarial influence

02

Enhances resilience in adversary-majority scenarios

03

Improves collaborative output quality

Abstract

While multi-agent LLM systems show strong capabilities in various domains, they are highly vulnerable to adversarial and low-performing agents. To resolve this issue, in this paper, we introduce a general and adversary-resistant multi-agent LLM framework based on credibility scoring. We model the collaborative query-answering process as an iterative game, where the agents communicate and contribute to a final system output. Our system associates a credibility score that is used when aggregating the team outputs. The credibility scores are learned gradually based on the past contributions of each agent in query answering. Our experiments across multiple tasks and settings demonstrate our system's effectiveness in mitigating adversarial influence and enhancing the resilience of multi-agent cooperation, even in the adversary-majority settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection