Beyond Benchmarks: How Users Evaluate AI Chat Assistants

Moiz Sadiq Awan; Muhammad Haris Noor; Muhammad Salman Munaf

arXiv:2603.25220·cs.HC·March 27, 2026

Beyond Benchmarks: How Users Evaluate AI Chat Assistants

Moiz Sadiq Awan, Muhammad Haris Noor, Muhammad Salman Munaf

PDF

Open Access

TL;DR

This study compares user satisfaction, motivations, and frustrations across seven major AI chat platforms, revealing that users treat these tools as interchangeable utilities and that platform specialization sustains competition.

Contribution

It provides the first systematic cross-platform survey comparing user perceptions and motivations, highlighting the importance of platform differentiation and user behavior patterns.

Findings

01

Top platforms have similar satisfaction ratings despite differences in resources.

02

Most users switch between multiple platforms with low switching costs.

03

Different platforms attract users for distinct reasons, supporting market diversity.

Abstract

Automated benchmarks dominate the evaluation of large language models, yet no systematic study has compared user satisfaction, adoption motivations, and frustrations across competing platforms using a consistent instrument. We address this gap with a cross-platform survey of 388 active AI chat users, comparing satisfaction, adoption drivers, use case performance, and qualitative frustrations across seven major platforms: ChatGPT, Claude, Gemini, DeepSeek, Grok, Mistral, and Llama. Three broad findings emerge. First, the top three platforms (Claude, ChatGPT, and DeepSeek) receive statistically indistinguishable satisfaction ratings despite vast differences in funding, team size, and benchmark performance. Second, users treat these tools as interchangeable utilities rather than sticky ecosystems: over 80% use two or more platforms, and switching costs are negligible. Third, each platform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · AI in Service Interactions · Ethics and Social Impacts of AI