Artemis: Structured Visual Reasoning for Perception Policy Learning

Wei Tang; Yanpeng Sun; Shan Zhang; Xiaofan Li; Piotr Koniusz; Wei Li; Na Zhao; Zechao Li

arXiv:2512.01988·cs.CV·December 2, 2025

Artemis: Structured Visual Reasoning for Perception Policy Learning

Wei Tang, Yanpeng Sun, Shan Zhang, Xiaofan Li, Piotr Koniusz, Wei Li, Na Zhao, Zechao Li

PDF

Open Access

TL;DR

Artemis introduces a structured, proposal-based visual reasoning framework for perception policy learning, improving performance and generalization by aligning reasoning with spatial and object-centric representations.

Contribution

The paper presents Artemis, a novel perception-policy learning framework that employs structured, proposal-based reasoning in spatial space, addressing limitations of linguistic reasoning.

Findings

01

Achieves strong performance on grounding and detection tasks.

02

Generalizes well to counting and geometric perception tasks.

03

Improves perception-policy learning through spatially grounded reasoning.

Abstract

Recent reinforcement-learning frameworks for visual perception policy have begun to incorporate intermediate reasoning chains expressed in natural language. Empirical observations indicate that such purely linguistic intermediate reasoning often reduces performance on perception tasks. We argue that the core issue lies not in reasoning per se but in the form of reasoning: while these chains perform semantic reasoning in an unstructured linguistic space, visual perception requires reasoning in a spatial and object-centric space. In response, we introduce Artemis, a perception-policy learning framework that performs structured proposal-based reasoning, where each intermediate step is represented as a (label, bounding-box) pair capturing a verifiable visual state. This design enables explicit tracking of intermediate states, direct supervision for proposal quality, and avoids ambiguity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning