Evaluating Cultural and Social Awareness of LLM Web Agents
Haoyi Qiu, Alexander R. Fabbri, Divyansh Agarwal, Kung-Hsiang Huang,, Sarah Tan, Nanyun Peng, Chien-Sheng Wu

TL;DR
This paper introduces CASA, a benchmark for evaluating large language model agents' cultural and social awareness in web-based tasks, revealing current limitations and exploring methods to improve their sensitivity to norms.
Contribution
The paper presents CASA, a novel benchmark for assessing LLM agents' cultural and social norm awareness, and evaluates methods like prompting and fine-tuning to enhance performance.
Findings
Current LLM agents have less than 10% awareness coverage.
Agents exhibit over 40% violation rates in norm detection.
Combining prompting and fine-tuning improves cultural adaptability.
Abstract
As large language models (LLMs) expand into performing as agents for real-world applications beyond traditional NLP tasks, evaluating their robustness becomes increasingly important. However, existing benchmarks often overlook critical dimensions like cultural and social awareness. To address these, we introduce CASA, a benchmark designed to assess LLM agents' sensitivity to cultural and social norms across two web-based tasks: online shopping and social discussion forums. Our approach evaluates LLM agents' ability to detect and appropriately respond to norm-violating user queries and observations. Furthermore, we propose a comprehensive evaluation framework that measures awareness coverage, helpfulness in managing user queries, and the violation rate when facing misleading web content. Experiments show that current LLMs perform significantly better in non-agent than in web-based agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsWikis in Education and Collaboration · Multi-Agent Systems and Negotiation · Spam and Phishing Detection
