Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?

Hongyu Li; Kuan Liu; Yuan Chen; Juntao Hu; Huimin Lu; Guanjie Chen; Xue Liu; Guangming Lu; Hong Huang

arXiv:2603.00166·cs.CV·May 12, 2026

Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?

Hongyu Li, Kuan Liu, Yuan Chen, Juntao Hu, Huimin Lu, Guanjie Chen, Xue Liu, Guangming Lu, Hong Huang

PDF

1 Datasets

TL;DR

This paper investigates why generative AI models struggle with simple tasks like generating pure color images, introducing a hierarchical framework called AI Obedience and a benchmark named Violin to evaluate model precision and alignment.

Contribution

The paper formalizes the concept of AI Obedience, introduces the Violin benchmark for deterministic tasks, and provides insights into model performance and alignment issues.

Findings

01

Closed-source models outperform open-source models in deterministic tasks.

02

Performance on the benchmark correlates with natural image generation quality.

03

Models exhibit an 'aesthetic bias' that hampers simple, low-entropy task execution.

Abstract

Recent advances in generative AI have shown human-level performance in complex content creation. However, we identify a "Paradox of Simplicity": models that can render complex scenes often fail at trivial, low-entropy tasks, such as generating a uniform pure color image. We argue this is a systemic failure related to uncontrollable emergent abilities. As models scale, strong priors for aesthetics and complexity override deterministic simplicity, creating an "aesthetic bias" that hinders the model's transition from data simulation to true intellectual abstraction. To better investigate this problem, we formalize the concept of AI Obedience, a hierarchical framework that grades a model's ability to transition from probabilistic approximation to pixel-level determinism (Levels 1 to 5). We introduce Violin, the first systematic benchmark designed to evaluate Level 4 Obedience through three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Perkzi/VIOLIN
dataset· 181 dl
181 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.