Image Quality Assessment for Embodied AI

Chunyi Li; Jiaohao Xiao; Jianbo Zhang; Farong Wen; Zicheng Zhang; Yuan Tian; Xiangyang Zhu; Xiaohong Liu; Zhengxue Cheng; Weisi Lin; Guangtao Zhai

arXiv:2505.16815·cs.CV·October 15, 2025

Image Quality Assessment for Embodied AI

Chunyi Li, Jiaohao Xiao, Jianbo Zhang, Farong Wen, Zicheng Zhang, Yuan Tian, Xiangyang Zhu, Xiaohong Liu, Zhengxue Cheng, Weisi Lin, Guangtao Zhai

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a new Image Quality Assessment (IQA) framework tailored for Embodied AI, addressing the gap in evaluating image usability for robotic tasks under real-world distortions.

Contribution

It constructs a comprehensive perception-cognition-decision pipeline, creates a large Embodied-IQA database, and evaluates existing IQA methods for embodied applications.

Findings

01

Mainstream IQA methods perform poorly on Embodied-IQA.

02

The Embodied-IQA database contains over 36k image pairs with 5 million annotations.

03

Need for developing specialized IQA methods for Embodied AI.

Abstract

Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 36k reference/distorted image pairs, with more than 5m fine-grained annotations…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 1

Strengths

- The idea of IQA for machines vs humans is established in prior works. This paper reframes the problem for robotics, the stage of how degradation of images affects robot task execution, not just visual recognition. - The paper was interesting to read, with extensive experiments and detailed analysis. - The Embodied-IQA database is very large, containing 36,900 image pairs and over 5 million fine-grained annotations, having good scale. It is also annotated along three unique axes, reflecting dif

Weaknesses

- The pipeline assumes vision is the dominant modality, neglecting that in true Embodied AI, perception often must fuse audio, tactile, and temperature cues. Is it possible that temperature might play a role on how bright the image might be? - The writing is often verbose and repeats technical claims across sections, particularly regarding pipeline design and dataset composition.

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper presents a clear motivation and significant innovation. It defines the image quality assessment (IQA) problem in embodied intelligence as “image usability for robots,” transcending traditional frameworks based on human or machine vision systems (HVS/MVS). It innovatively models the robot's “decision-making” and “execution” phases explicitly. 2. The research exhibits high quality. First, its constructed dataset is exemplary in scale (36k+ images, 5m+ labels), breadth (30 distortion t

Weaknesses

This study exhibits several critical weaknesses. 1. Methods that evaluate differences based on metrics may inherit and amplify inherent biases and errors within the model and its assessment indicators. Specifically, using metrics like BLEU/ROUGE to measure cognitive comprehension is highly sensitive to phrasing and redundancy, potentially failing to accurately reflect task equivalence. Adopting structured patterns (e.g., action-parameter tuples) or task success classifiers may be more robust alt

Reviewer 03Rating 8Confidence 3

Strengths

1. The paper identifies and clearly defines a completely new, critical, and timely research problem: assessing image quality for Embodied AI. Its theoretical framework based on the "Mertonian system" to differentiate RVS, MVS, and HVS is highly novel and persuasive, laying a solid theoretical foundation for this new field. 2. The paper's main contribution—the Embodied-IQA dataset—is an extremely valuable resource. Its scale and granularity are unprecedented in the IQA field. This dataset will l

Weaknesses

1. The paper defines the VLA "Decision" score as a simple average of errors in three dimensions: Position, Rotation, and State. This metric seems overly simplistic. In real robotics tasks, the importance of these three dimensions can be highly imbalanced (e.g., a minor rotation error could cause catastrophic failure, while a larger position error might still be acceptable). 2. As a benchmark paper, its primary duty is to define the problem and provide data, which it does exceptionally well. Howe

Code & Models

Repositories

lcysyzxdxc/embodiediqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Explainable Artificial Intelligence (XAI) · AI in cancer detection