A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

Brandon Smith; Mohamed Reda Bouadjenek; Tahsin Alamgir Kheya; Phillip Dawson; Sunil Aryal

arXiv:2505.09056·cs.CL·May 15, 2025·2 cites

A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

Brandon Smith, Mohamed Reda Bouadjenek, Tahsin Alamgir Kheya, Phillip Dawson, Sunil Aryal

PDF

Open Access

TL;DR

This study analyzes the output similarity, diversity, and bias of 12 large language models across 5,000 prompts, revealing differences in variability, style, and ethical considerations to inform future AI development.

Contribution

It provides a comprehensive comparison of LLM output similarity, diversity, and bias across multiple models and tasks, highlighting key behavioral differences.

Findings

01

Outputs from the same LLM are more similar than to human texts.

02

GPT-4 produces more varied responses than WizardLM-2-8x22b.

03

Some models show higher gender balance and reduced bias.

Abstract

Large Language Models (LLMs) represent a major step toward artificial general intelligence, significantly advancing our ability to interact with technology. While LLMs perform well on Natural Language Processing tasks -- such as translation, generation, code writing, and summarization -- questions remain about their output similarity, variability, and ethical implications. For instance, how similar are texts generated by the same model? How does this compare across different models? And which models best uphold ethical standards? To investigate, we used 5{,}000 prompts spanning diverse tasks like generation, explanation, and rewriting. This resulted in approximately 3 million texts from 12 LLMs, including proprietary and open-source systems from OpenAI, Google, Microsoft, Meta, and Mistral. Key findings include: (1) outputs from the same LLM are more similar to each other than to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance · Attention Is All You Need · Label Smoothing · Adam · Linear Layer · Dropout · Residual Connection · Multi-Head Attention · Dense Connections · Layer Normalization