Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Ruifeng Deng, Xin Li,, Errui Ding, Hao Wang

TL;DR
The paper introduces Paint Transformer, a novel neural network that predicts stroke sets for image painting in parallel, enabling near real-time non-photo-realistic image recreation without requiring pre-existing datasets.
Contribution
It formulates neural painting as a set prediction problem and proposes a Transformer-based feed forward model trained via self-supervision, improving efficiency and generalization.
Findings
Achieves better painting quality than previous methods.
Operates in near real-time for 512x512 images.
Does not require external datasets for training.
Abstract
Neural painting refers to the procedure of producing a series of strokes for a given image and non-photo-realistically recreating it using neural networks. While reinforcement learning (RL) based agents can generate a stroke sequence step by step for this task, it is not easy to train a stable RL agent. On the other hand, stroke optimization methods search for a set of stroke parameters iteratively in a large search space; such low efficiency significantly limits their prevalence and practicality. Different from previous methods, in this paper, we formulate the task as a set prediction problem and propose a novel Transformer-based framework, dubbed Paint Transformer, to predict the parameters of a stroke set with a feed forward network. This way, our model can generate a set of strokes in parallel and obtain the final painting of size 512 * 512 in near real time. More importantly, since…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Label Smoothing · Residual Connection · Byte Pair Encoding
