Hierarchical Process Reward Models are Symbolic Vision Learners
Shan Zhang, Aotian Chen, Kai Zou, Jindong Gu, Yuan Xue, Anton van den Hengel

TL;DR
This paper introduces a novel self-supervised symbolic auto-encoder for diagram understanding that encodes geometric primitives and their relationships, combining hierarchical reinforcement learning with neuro-symbolic reasoning for improved interpretability and performance.
Contribution
It proposes a hierarchical process reward model and stabilization techniques for reinforcement learning in symbolic vision, enhancing diagram reconstruction and reasoning capabilities.
Findings
Achieved 98.2% reduction in MSE for diagram reconstruction.
Surpassed GPT-4o by 0.6% on chart reconstruction with a 7B model.
Improved perception and reasoning benchmarks by +13% and +3%, respectively.
Abstract
Symbolic computer vision represents diagrams through explicit logical rules and structured representations, enabling interpretable understanding in machine vision. This requires fundamentally different learning paradigms from pixel-based visual models. Symbolic visual learners parse diagrams into geometric primitives-points, lines, and shapes-whereas pixel-based learners operate on textures and colors. We propose a novel self-supervised symbolic auto-encoder that encodes diagrams into structured primitives and their interrelationships within the latent space, and decodes them through our executable engine to reconstruct the input diagrams. Central to this architecture is Symbolic Hierarchical Process Reward Modeling, which applies hierarchical step-level parsing rewards to enforce point-on-line, line-on-shape, and shape-on-relation consistency. Since vanilla reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics
