ALSO: Adversarial Online Strategy Optimization for Social Agents
Xiang Li, Liping Yi, Mingze Kong, Min Zhang, Zhongxiang Dai, QingHua Hu

TL;DR
ALSO introduces an online adversarial bandit framework with neural reward prediction to enable dynamic strategy adaptation in social agents, outperforming static approaches in non-stationary multi-turn dialogues.
Contribution
The paper presents the first framework for online strategy optimization in social simulation, addressing non-stationarity with adversarial bandits and neural reward prediction.
Findings
ALSO outperforms static baselines in dynamic environments.
The neural surrogate improves reward prediction and exploration.
ALSO effectively adapts strategies in multi-turn social interactions.
Abstract
Social simulation provides a compelling testbed for studying social intelligence, where agents interact through multi-turn dialogues under evolving contexts and strategically adapting opponents. Such environments are inherently non-stationary, requiring agents to dynamically adjust their strategies over time. However, most Large Language Model (LLM) based social agents rely on static personas, while existing approaches for enhancing social intelligence, such as offline reinforcement learning or external planners, are ill-suited to these settings, typically assuming stationarity and incurring substantial training overhead. To bridge this gap, we propose \textbf{ALSO} (\textbf{A}dversarial on\textbf{L}ine \textbf{S}trategy \textbf{O}ptimization), the first framework for online strategy optimization in multi-agent social simulation. ALSO advances social adaptation through two key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
