CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation
Yunfan Yang, Cuiling Lan, Jitao Sang, Yan Lu

TL;DR
This paper introduces CSPO, a reinforcement learning framework that improves table-to-LaTeX conversion by disentangling rewards for structure, style, and content, leading to more accurate and reliable generation.
Contribution
CSPO is a novel RL method that assigns component-specific rewards, reducing reward ambiguity and enhancing structured table generation from images.
Findings
CSPO outperforms baseline models in hierarchical evaluation metrics.
Component-specific rewards improve fidelity in structure, style, and content.
Targeted optimization leads to more reliable table-to-LaTeX conversion.
Abstract
Tables contain rich structured information, yet when stored as images their contents remain "locked" within pixels. Converting table images into LaTeX code enables faithful digitization and reuse, but current multimodal large language models (MLLMs) often fail to preserve structural, style, or content fidelity. Conventional post-training with reinforcement learning (RL) typically relies on a single aggregated reward, leading to reward ambiguity that conflates multiple behavioral aspects and hinders effective optimization. We propose Component-Specific Policy Optimization (CSPO), an RL framework that disentangles optimization across LaTeX tables components-structure, style, and content. In particular, CSPO assigns component-specific rewards and backpropagates each signal only through the tokens relevant to its component, alleviating reward ambiguity and enabling targeted component-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
