StyleShot: A Snapshot on Any Style
Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai, Chen, Cairong Zhao

TL;DR
StyleShot introduces a style-aware encoder and a comprehensive style dataset to enable generalized style transfer without test-time tuning, demonstrating superior performance across diverse styles.
Contribution
The paper presents a novel style-aware encoder and StyleGallery dataset that together enable effective, generalized style transfer without the need for test-time tuning.
Findings
Achieves superior style transfer performance compared to state-of-the-art methods.
Effectively generalizes to various styles including 3D, flat, and fine-grained.
Does not require test-time tuning for style transfer.
Abstract
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training strategy, and StyleGallery enables the generalization ability. We further employ a content-fusion encoder to enhance image-driven style transfer. We highlight that, our approach, named StyleShot, is simple yet effective in mimicking various desired styles, i.e., 3D, flat, abstract or even fine-grained styles, without test-time tuning. Rigorous experiments validate that, StyleShot achieves superior performance across a wide range of styles compared to existing state-of-the-art methods. The…
Peer Reviews
Decision·Submitted to ICLR 2025
The results look interesting and the method is straightforward.
In Figure 7, the results from StyleAligned look weird, can you detail how the results were generated? While training for the style encoder, it seems that a crafted datasets were used. Can you detailed the model training details including the encoder as well as the feature injection part in the diffusion model?
Overall, the reviewer tends to classify this paper's originality as removing limitations from prior results. The proposed method is reasonable and has good motivation. The focus of learning style representations for style transfer tasks is valid. Some of the generated results are impressive.
- Overall, the reviewer would not rate the technical contribution of the proposed method high. Based on the IP adaptor, the proposed method is somewhat incremental. It seems to be a successful attempt to combine MOE and IP adaptor. - The reviewer is confused about the claim that the authors ''show that a good style representation is crucial'' for style transfer. Since CAST has already observed related conclusions and proved the usefulness of learning a good style encoder. - Related to the fir
1. The artistic images generated by the proposed method is impressive. 2. This paper provides a new style-balanced dataset. 3. The proposed method is simple yet effective. 4. Extensive experiments are conducted to evaluate the performance of the proposed method.
1. The quantitative results are not satisfying. The CLIP scores reported in Table 1 show that the proposed method is just comparable with previous methods (no gain). Although the authors claim that these metrics are not ideal for evaluation in style transfer tasks, they are still among the most widely used and authoritative metrics in this field. 2. When comparing with existing text-driven style transfer methods, several more state-of-the-art approaches (such as DreamStyler, T2I-Adapter, and In
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Artificial Intelligence in Games
