Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders

Yiming Tang; Arash Lagzian; Srinivas Anumasa; Qiran Zou; Yingtao Zhu; Ye Zhang; Trang Nguyen; Yih-Chung Tham; Ehsan Adeli; Ching-Yu Cheng; Yilun Du; Dianbo Liu

arXiv:2508.18236·cs.CV·April 23, 2026

Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders

Yiming Tang, Arash Lagzian, Srinivas Anumasa, Qiran Zou, Yingtao Zhu, Ye Zhang, Trang Nguyen, Yih-Chung Tham, Ehsan Adeli, Ching-Yu Cheng, Yilun Du, Dianbo Liu

PDF

TL;DR

This paper introduces LanSE, a tool that decomposes images into interpretable visual patterns with natural language descriptions to improve analysis of AI-generated content.

Contribution

LanSE is a novel method that automatically identifies and describes visual patterns, enhancing content evaluation and extending to various data modalities.

Findings

01

Discovered over 5,000 visual patterns with 93% human agreement.

02

Outperforms existing methods in decomposed evaluation.

03

First systematic evaluation of physical plausibility in AI content analysis.

Abstract

The rapid development of generative AI has transformed content creation, communication, and human development. However, this technology raises profound concerns in high-stakes domains, demanding rigorous methods to analyze and evaluate AI-generated content. While existing analytic methods often treat images as indivisible wholes, real-world AI failures generally manifest as specific visual patterns that can evade holistic detection and suit more granular and decomposed analysis. Here we introduce a content analysis tool, Language-Grounded Sparse Encoders (LanSE), which decompose images into interpretable visual patterns with natural language descriptions. Utilizing interpretability modules and large multimodal models, LanSE can automatically identify visual patterns within data modalities. Our method discovers more than 5,000 visual patterns with 93\% human agreement, provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.