Fine-Grained Spatially Varying Material Selection in Images
Julia Guerrero-Viu, Michael Fischer, Iliyan Georgiev, Elena Garces, Diego Gutierrez, Belen Masia, Valentin Deschaintre

TL;DR
This paper introduces a robust, multi-resolution vision transformer-based method for fine-grained material selection in images, supporting texture and subtexture levels, to improve image editing tasks under varying lighting and reflectance conditions.
Contribution
It proposes a novel multi-resolution ViT approach for material selection, along with a new dataset for dense annotation at texture and subtexture levels.
Findings
Finer and more stable selection results than prior methods.
Effective handling of lighting and reflectance variations.
Supports detailed editing at multiple material levels.
Abstract
Selection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for downstream editing tasks. We rely on vision transformer (ViT) models and leverage their features for selection, proposing a multi-resolution processing strategy that yields finer and more stable selection results than prior methods. Furthermore, we enable selection at two levels: texture and subtexture, leveraging a new two-level material selection (DuMaS) dataset which includes dense annotations for over 800,000 synthetic images, both on the texture and subtexture levels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Image Enhancement Techniques
MethodsLinear Layer · Softmax · Attention Is All You Need · Multi-Head Attention · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer
