Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders
Yiming Tang, Arash Lagzian, Srinivas Anumasa, Qiran Zou, Yingtao Zhu, Ye Zhang, Trang Nguyen, Yih-Chung Tham, Ehsan Adeli, Ching-Yu Cheng, Yilun Du, Dianbo Liu

TL;DR
This paper introduces LanSE, a tool that decomposes images into interpretable visual patterns with natural language descriptions to improve analysis of AI-generated content.
Contribution
LanSE is a novel method that automatically identifies and describes visual patterns, enhancing content evaluation and extending to various data modalities.
Findings
Discovered over 5,000 visual patterns with 93% human agreement.
Outperforms existing methods in decomposed evaluation.
First systematic evaluation of physical plausibility in AI content analysis.
Abstract
The rapid development of generative AI has transformed content creation, communication, and human development. However, this technology raises profound concerns in high-stakes domains, demanding rigorous methods to analyze and evaluate AI-generated content. While existing analytic methods often treat images as indivisible wholes, real-world AI failures generally manifest as specific visual patterns that can evade holistic detection and suit more granular and decomposed analysis. Here we introduce a content analysis tool, Language-Grounded Sparse Encoders (LanSE), which decompose images into interpretable visual patterns with natural language descriptions. Utilizing interpretability modules and large multimodal models, LanSE can automatically identify visual patterns within data modalities. Our method discovers more than 5,000 visual patterns with 93\% human agreement, provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
