Dual Reinforcement-Based Specification Generation for Image De-Rendering
Ramakanth Pasunuru, David Rosenberg, Gideon Mann, Mohit Bansal

TL;DR
This paper compares LSTM and Transformer decoders for graphics program inference from images, revealing Transformer robustness to sequence order and introducing reinforcement learning with multiple rewards to improve decoding quality, achieving state-of-the-art results.
Contribution
It introduces a reinforcement learning approach with multiple rewards to enhance decoder inductive bias and compares sequence models for graphics program inference.
Findings
Transformers are less sensitive to sequence ordering than LSTMs.
Reinforcement learning with multiple rewards improves graphics program generation.
Achieved state-of-the-art results on two datasets.
Abstract
Advances in deep learning have led to promising progress in inferring graphics programs by de-rendering computer-generated images. However, current methods do not explore which decoding methods lead to better inductive bias for inferring graphics programs. In our work, we first explore the effectiveness of LSTM-RNN versus Transformer networks as decoders for order-independent graphics programs. Since these are sequence models, we must choose an ordering of the objects in the graphics programs for likelihood training. We found that the LSTM performance was highly sensitive to the sequence ordering (random order vs. pattern-based order), while Transformer performance was roughly independent of the sequence ordering. Further, we present a policy gradient based reinforcement learning approach for better inductive bias in the decoder via multiple diverse rewards based both on the graphics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Video Analysis and Summarization
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Sigmoid Activation · Byte Pair Encoding · Layer Normalization · Residual Connection · Adam · Dropout · Tanh Activation
