TL;DR
This paper introduces PAVE, a four-module cognitive architecture enabling generative agents to reason about and execute legitimate rule violations in cooperative scenarios, enhancing interpretability and plausibility.
Contribution
The paper presents PAVE, a novel architecture with explicit legitimacy assessment and scoped violations, implemented in Voville for evaluating rule-breaking behaviors in LLM-based agents.
Findings
PAVE agents satisfy properties of legitimate violation, authority deference, bounded scope, and recovery.
PAVE agents are more structured, interpretable, and rated more plausible than vanilla models.
Ablation of the legitimacy gate causes vanilla-like failures.
Abstract
Generative agents based on large language models reproduce believable human behavior in cooperative settings, but how they should reason in situations where rule-breaking may be required, such as fire evacuation or authority-supervised emergency, remains poorly characterized. We propose PAVE (Perception, Assessment, Verdict, Emulation), a novel four-module cognitive architecture that addresses this gap end to end: (i) Perception extracts a structured context with explicit authority distance, peer behaviors, and severity-tagged situational cues; (ii) Assessment scores the context along five scalars including an explicit legitimacy judgment that checks necessity, proportionality, and absence of alternatives; (iii) Verdict decides to comply or violate under a hard legitimacy gate, with a per-agent threshold elicited from the persona; (iv) Emulation enacts the verdict and scopes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
