# The Assessment of Body Image Based on Large Language Model

**Authors:** Fumeng Li, Nan Zhao

PMC · DOI: 10.1002/pchj.70048 · PsyCh Journal · 2025-08-30

## TL;DR

This paper introduces a new method using large language models to assess body image in adolescents, showing better performance than traditional methods.

## Contribution

A novel multidimensional body image assessment using LLMs with improved validity and ecological tracking.

## Key findings

- LLM-based assessments outperformed dictionary and human ratings in body image dimensions.
- Role-playing techniques improved validity in perception by +0.117.
- Qwen model achieved 53.1% higher correlation in social media behavior than dictionary methods.

## Abstract

Assessing adolescent body image is crucial for mental health interventions, yet traditional methods suffer from limited dimensional coverage, poor dynamic tracking, and weak ecological validity. To address these gaps, this study proposes a multidimensional evaluation using large language models (LLMs) and compares its criterion validity against a dictionary‐based method and expert ratings. We defined four dimensions—perception, positive attitude, negative attitude, behavior—by reviewing the body‐image literature and built a validated dictionary through expert ratings and iterative refinement. A four‐step prompt‐engineering process, incorporating role‐playing and other optimization techniques, produced tailored prompts for LLM‐based recognition. To validate these tools, we collected self‐reported texts and scale scores from 194 university students, performed semantic analyses with Llama‐3.1‐70B, Qwen‐Max, and DeepSeek‐R1 using these prompts, and confirmed ecological validity on social media posts. Results indicate that our multidimensional dictionary correlated significantly with expert ratings across all four dimensions (r = 0.515–0.625), providing a solid benchmark. LLM‐based assessments then outperformed both the dictionary and human ratings, with zero‐shot LLMs achieving r = 0.664 in positive attitude (vs. expert r = 0.657) and DeepSeek‐R1 reaching r = 0.722 in perception. Role‐playing techniques significantly improved the validity in the perception dimension (Δr = +0.117). Consistency checks revealed that the DeepSeek model reduced error dispersion in extreme score ranges by 48.4% compared to human ratings, with the 95% consistency limits covering the fluctuations of human scores. Incremental validity analysis showed that LLMs could replace human evaluations in the perception dimension (ΔR
2 = 0.220). In ecological validity checks, the Qwen model achieved a correlation of 0.651 in the social media behavior dimension—53.1% higher than the dictionary method. We found that LLMs demonstrated significant advantages in the multidimensional assessment of body image, offering a new intelligent approach to mental health measurement.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12520832/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12520832/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12520832/full.md

---
Source: https://tomesphere.com/paper/PMC12520832