Hierarchical Process Reward Models are Symbolic Vision Learners

Shan Zhang; Aotian Chen; Kai Zou; Jindong Gu; Yuan Xue; Anton van den Hengel

arXiv:2512.03126·cs.CV·December 4, 2025

Hierarchical Process Reward Models are Symbolic Vision Learners

Shan Zhang, Aotian Chen, Kai Zou, Jindong Gu, Yuan Xue, Anton van den Hengel

PDF

Open Access

TL;DR

This paper introduces a novel self-supervised symbolic auto-encoder for diagram understanding that encodes geometric primitives and their relationships, combining hierarchical reinforcement learning with neuro-symbolic reasoning for improved interpretability and performance.

Contribution

It proposes a hierarchical process reward model and stabilization techniques for reinforcement learning in symbolic vision, enhancing diagram reconstruction and reasoning capabilities.

Findings

01

Achieved 98.2% reduction in MSE for diagram reconstruction.

02

Surpassed GPT-4o by 0.6% on chart reconstruction with a 7B model.

03

Improved perception and reasoning benchmarks by +13% and +3%, respectively.

Abstract

Symbolic computer vision represents diagrams through explicit logical rules and structured representations, enabling interpretable understanding in machine vision. This requires fundamentally different learning paradigms from pixel-based visual models. Symbolic visual learners parse diagrams into geometric primitives-points, lines, and shapes-whereas pixel-based learners operate on textures and colors. We propose a novel self-supervised symbolic auto-encoder that encodes diagrams into structured primitives and their interrelationships within the latent space, and decodes them through our executable engine to reconstruct the input diagrams. Central to this architecture is Symbolic Hierarchical Process Reward Modeling, which applies hierarchical step-level parsing rewards to enforce point-on-line, line-on-shape, and shape-on-relation consistency. Since vanilla reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics