# A linguistic comparison between human- and AI-generated content

**Authors:** Flávia A. Rodrigues, Niclas F. Sturm, Flávio L. Pinheiro

PMC · DOI: 10.1016/j.isci.2026.114976 · iScience · 2026-02-12

## TL;DR

This study compares how human and AI-generated Portuguese texts differ in language style and how these differences impact misinformation detection.

## Contribution

The study reveals distinct linguistic patterns in AI-generated Portuguese content and their effect on misinformation detection accuracy.

## Key findings

- AI-generated texts are more formal, positive, and structured compared to human-written texts.
- Misinformation detection models perform better on human texts (93% accuracy) than on AI-generated texts (75% accuracy).
- Human texts show greater variation in length and use more negative emotions and personal references.

## Abstract

This study explores the linguistic differences between AI-generated content and human-written texts, particularly in Portuguese. We created two datasets: one with factual and false human-written texts, and another with texts generated by advanced, large language models (LLMs; GPT-4o, Mistral Large, and Llama 3.3 70B), using various prompts. Using tools like linguistic inquiry and word count (LIWC) and sparse additive generative model (SAGE), we identified distinctive traits: AI-generated text tends to be more formal, structured, positive, and motivational, while human texts vary more in length, exhibit negative emotions, and often use personal references. Additionally, a misinformation detection model performed well on human texts (93% accuracy) but struggled with LLM outputs (75% accuracy). This highlights the unique linguistic patterns of AI-generated misinformation and underscores the need for better detection methods to tackle misleading content in Portuguese.

•We compare the linguistic characteristics of content authored by humans and LLMs•This study focuses on content in Portuguese and contrasts true and false information•We show that AI-generated misinformation is more formal, positive, and structured•Such characteristics affect the performance of misinformation detection models

We compare the linguistic characteristics of content authored by humans and LLMs

This study focuses on content in Portuguese and contrasts true and false information

We show that AI-generated misinformation is more formal, positive, and structured

Such characteristics affect the performance of misinformation detection models

Artificial intelligence; Social sciences; Linguistics

## Full-text entities

- **Diseases:** communicable diseases (MESH:D003141), LLMs (MESH:D007806), respiratory pathologies (MESH:D012131), Non-communicable diseases (MESH:D000073296), Cardiac problems (MESH:D006331), cerebrovascular accidents (MESH:D020521), cancer (MESH:D009369), diabetes (MESH:D003920), COVID-19 (MESH:D000086382), anxiety (MESH:D001007), LIWC (MESH:D001037), death (MESH:D003643), respiratory diseases (MESH:D012140)
- **Chemicals:** alcohol (MESH:D000438), 4o (-)
- **Species:** Lama glama (llama, species) [taxon 9844], Liphistius sp. LM (species) [taxon 1285381], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** -4o — Homo sapiens (Human), Cystic fibrosis, Transformed cell line (CVCL_IN63)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12969083/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12969083/full.md

## References

81 references — full list in the complete paper: https://tomesphere.com/paper/PMC12969083/full.md

---
Source: https://tomesphere.com/paper/PMC12969083