Facial Makeup Transfer Combining Illumination Transfer
Xin Jin, Rui Han, Ning Ning, Xiaodong Li, Xiaokun Zhang

TL;DR
This paper introduces a real-time facial makeup transfer method that combines illumination transfer, enabling effective and quick virtual makeup application using only a single reference image, suitable for a Windows platform.
Contribution
The paper presents a novel real-time facial makeup transfer approach that incorporates illumination transfer, improving effectiveness and speed over existing deep learning methods.
Findings
Effective transfer of dark and white makeup using illumination transfer
Real-time processing within seconds
Accurate makeup transfer even with reference images with air-bangs
Abstract
To meet the women appearance needs, we present a novel virtual experience approach of facial makeup transfer, developed into windows platform application software. The makeup effects could present on the user's input image in real time, with an only single reference image. The input image and reference image are divided into three layers by facial feature points landmarked: facial structure layer, facial color layer, and facial detail layer. Except for the above layers are processed by different algorithms to generate output image, we also add illumination transfer, so that the illumination effect of the reference image is automatically transferred to the input image. Our approach has the following three advantages: (1) Black or dark and white facial makeup could be effectively transferred by introducing illumination transfer; (2) Efficiently transfer facial makeup within seconds…
| Notationl | Meaning |
|---|---|
| Input image | |
| Reference image (after warping) | |
| Output image | |
| Facial structure layer | |
| Facial detail layer | |
| CIELAB facial color layer a | |
| CIELAB facial color layer b | |
| Weight controlling the degree of blending and in | |
| Weight controlling the illumination transfer and in | |
| Image pixel point | |
| Skin region of the facial image |
| Methods | Environment | Time |
|---|---|---|
| Liu et al. IJCAI 2016 [4] | image pair using TITAN X GPU | 6s |
| Our method | image pair using iPhone6 | 2s |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\history
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. 10.1109/ACCESS.2017.DOI
\corresp
∗Corresponding author: Xiaodong Li (e-mail: [email protected]).
Facial Makeup Transfer Combining Illumination Transfer
XIN JIN1,2,3
RUI HAN1
NING NING1
XIAODONG LI1*,AND XIAOKUN ZHANG1
Department of Cyber Security, Beijing Electronic Science and Technology Institute, Beijing, 100070, PR China
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, 100191, PR China
Department of Automation, Tsinghua University, Beijing, 100083, PR China
Abstract
To meet the women appearance needs, we present a novel virtual experience approach of facial makeup transfer, developed into windows platform application software. The makeup effects could present on the user’s input image in real time, with an only single reference image. The input image and reference image are divided into three layers by facial feature points landmarked: facial structure layer, facial color layer, and facial detail layer. Except for the above layers are processed by different algorithms to generate output image, we also add illumination transfer, so that the illumination effect of the reference image is automatically transferred to the input image. Our approach has the following three advantages: (1) Black or dark and white facial makeup could be effectively transferred by introducing illumination transfer; (2) Efficiently transfer facial makeup within seconds compared to those methods based on deep learning frameworks; (3) Reference images with the air-bangs could transfer makeup perfectly.
Index Terms:
Facial Makeup Transfer , Single Reference Image , Illumination Transfer , Facial Parsing , Efficient and Effective.
\titlepgskip
=-15pt
I Introduction
Facial makeup transfer is a new application requirement of virtual reality technology in the image. How to see the virtual makeup effect on the image is the need of many young women. Facial makeup is a technique that changes the appearance with special toiletries such as compact, setting powder, and moisturizer. Under many circumstances, particularly for females, makeup is deemed as a necessary practice to beautify appearance. Emulsions are often used to alter the facial skin detail. Compacts are primarily used to hide defects and overlay the initial facial skin detail. Setting powder often satisfies detail for the skin. Except that, other colour makeup, such as eyeliner and shadow, is applied to the upper layer of the setting powder.
The ever-developing makeup technology now extends to different women facial types, different scenes, different ages, different skin, and even different costumes with different makeup [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. The choice of makeup naturally creates a personal experience but greatly consumes time and damages women’s skin.
Our method based on the technical application of facial makeup transfer completely considered all of the above circumstances. As shown in FIGURE 1, with the image prototype (FIGURE 1a) as the input image, with the pattern example (FIGURE 1b) as the reference image, our method could successfully transfer the reference image makeup to the input image to generate output image (FIGURE 1c).
II Related Work
In 2007, Tong et al. [1] of the Hong Kong University proposed a facial-to-facial makeup transfer method based on a quotient image. Using the quotient image from a pair of images of the identical person applying and removing makeup as reference images to transfer the reference makeup to the input facial image. Their presented method could be divided into four steps, firstly removing the eyebrows and eyelashes of the input image to prepare for the eye makeup transfer. Then filling resulting holes using texture-synthesis, thus to extract inherent skin features of the input image. Manually specifying the point correspondence between the facial image and the facial model containing 84 landmark points to prepare for facial deformation. Secondly, the reference facial image is deformed according to the input image. Thirdly, the output is multiplied by the input image to achieve the makeup transfer, where the makeup of the same facial before and after is used to indicate the change of the makeup. Finally, eye makeup requires additional processing, which is generally more complicated, and the color is changeable.
In 2009, Guo et al. [2] of the Singapore National University proposed a simpler method, not for reference image before facial makeup but for an reference image after facial makeup. The method first performs facial alignment between the input facial image and the reference facial image. Since the information is transferred from pixel to pixel, it needs to be fully aligned before transfering, and then layer is decomposed by the Edge Preserving Smooth Filter. The input image and the reference image are resolved into the following three layers: facial structure layer, facial color layer, and facial detail layer. The information for each layer of the reference image is transferred to the homologous layer of the input image differently: the facial detail layer is direct transferred; the facial color layer is transferred in alpha hybrid mode. The three composite layers are combined to obtain the resulting image.
In 2015, Li et al. [3] of Zhejiang University proposed a facial image makeup editing method based on intrinsic images. The method uses the intrinsic image decomposition method to directly decompose the input facial image into the illumination layer and the reflectance layer, and then edits the makeup information of the facial image in the reflectivity layer, rather than need reference image, and finally decomposes the previous image. The illumination and shadow layers are combined to obtain a makeup editing effect.
In 2016, Liu et al. [4] of NVIDIA Research designed a new deep convolutional neural network for makeup transfer, which not only could transfer makeup, eye shadow, lip makeup, but also recommend the most suitable input image’s makeup. The network consists of two consecutive steps. The first step is to use the FCN network to parse the facial and resolve different parts, which are distinguished by different colors. The input image and the facial decomposition image of the input image, and the reference image and the facial decomposition image of the reference image are used as input of the makeup transfer network. According to the characteristics of the facial makeup, eye shadow and lip makeup are processed by different loss functions, and the three are integrated. And adding a part of the retained facial image of the input image to get the final result image.
In 2018, Chang et al. [5] of Princeton University in the United States proposed the PairedCycleGAN network for transfering the facial makeup of the reference image to the input image. The main idea is to train the generation network and the authentication network to transfer a specific makeup style. Chang et al. [5] trained three generators separately, focusing the network capacity and resolution on the unique features of each region. For each pair of images before and after makeup, firstly apply a facial analysis algorithm to segment each facial component, such as eyes, eyebrows, lips, nose, and etc. Finally each component is separately calculated and recombined.
III Facial Makeup Transfer
Our method uses the input image which applies facial makeup image and the reference image which provides the makeup example style as input, and the result is the output image which retains the facial structure of while applying the makeup style from . The notation we used is enumerated in TABLE I.
The complete pipeline is shown in FIGURE 2. Before the pipeline begins, we need to perform whitening and smoothing pretreatment onto the input image as a small optimization.The pipeline mainly has the following four steps. Firstly, facial alignment has to be done between the input facial image and the reference facial image. Since the information is transferred from pixel to pixel, it needs to be perfectly aligned before the makeup transfer. We use a modified Active Structure Search Algorithm to find the corresponding 90 feature points and affine transformation to distort the reference image into the input image .Secondly, followed by layer decomposition. Both and are resolved into the following three layers: facial structure layer, facial color layer, and facial detail layer. Thirdly, the information from per layer of is transferred to the related layer of in their own way: facial detail is transferred directly; facial color is transferred through alpha blending; facial illumination of the facial structure layer is transferred with specific algorithm. And three composite layers are ultimate combined. Fourthly, we use facial parsing to judge facial label probability of each pixel and then retain the components of the input image and the components of the initial makeup in different probability to fuse into the final makeup.
III-A Whitening and Smoothing
On the one hand, we use the OpenCV Color Balance Algorithm to achieve facial whitening. Color balance global adjustments image dominant colors including red, green and blue. The whole process is briefly described below: firstly initializing image each pixel brightness area (i.e. highlights, mid-tones, shadows), nextly adjusting each brightness area corresponding variable parameters with color balance coefficient, then figuring out image red, green, blue channel value used for adjusting image color, finally balancing the whole image color based on red, green, blue channel value. On the other hand, we use the OpenCV Bilateral Filtering Algorithm [12] to achieve facial smoothing. Bilateral filtering performed in the CIELAB color space is the most natural type of filtering for color images: only perceptually similar colors are averaged together, and only perceptually important edges are preserved while eliminating noise. The basic idea underlying bilateral filtering is not only considers the influence of the position on the central pixel, but also considers the similarity degree between the pixel and the central pixel in the convolution kernel, and generates two different weights according to the similarity degree between the position influence and the pixel value. Consider the two weights when computing center pixels, and realize bilateral low-pass filtering.
III-B Facial Alignment
For facial alignment, we firstly use the modified Active Shape Model (ASM) of Milborrow et al. [14] to obtain the facial feature points and then use the affine transformation algorithm to warp the reference image into the input image . Due to the variety of appearances in the underside of various possible makeup, our facial feature points landmark software needs to obtain more precise facial feature points in an automatic and manual manner. Our examples of a total of 90 landmark points on the facial are shown in FIGURE 3.
III-C Layer Decomposition
The facial is segmented according to the components distribution of each pixel. As shown in FIGURE 4, we utilize facial parsing of Liu et al. [6] to define different facial components to obtain components label of per pixel, including hair, eyebrows, eyes, nose, lips, mouth, facial skin and background.
As shown in FIGURE 5, we use the above 90 landmark feature points including the input image and reference image to warp the reference image to input image for facial alignment.
We parse the input facial image and select 11 sorts of labels which seldom cover all the facial components. Then we tint 11 facial component labels to get the facial hard mask. Next we segment facial into different regions with facial hard mask, guiding different makeup transfer operations onto facial regions.
We choose CIELAB color space to decompose the input image and the reference image (after warping) into facial structure layer, facial color layer (i.e. CIELAB color channels a, b channel), and facial detail layer. The CIELAB color space of Lukac et al. [7] performs better than other color spaces in terms of separation brightness and approximates the perceptual unity of Wood-land et al. [8].
Secondly, according to the approach of Eisemann et al. [9], Zhang et al. [10], and the Weighted Least Squares (WLS) presented by Farbman et al. [1], we perform edge-preserving smoothing filter on the luminance layer to extract the facial structure layer , then subtracted from the luminosity layer to obtain a facial detail layer .
III-D Layer Transfer
We define the facial detail layer , i.e.
[TABLE]
We define the facial color layer as the alpha-blending of the CIELAB color channels and of and , i.e.
[TABLE]
where is the mixing weight that controls the two color channels, is the image pixel point, is the skin region of the facial image, and means the image pixel point belonging to facial skin region.
We define the facial structure of as
[TABLE]
III-E Illumination Transfer
We define the following formula to achieve illumination transfer:
[TABLE]
where as the illumination transfer parameter between input facial structure and reference facial structure, is the image pixel point, is the skin region of the facial image, and refers to the image pixel point belonging to the facial skin region.
IV Experiments and Results
IV-A Data Collection
For our makeup transfer experiments, in order to achieve better results, we collect two separate high-resolution datasets, one containing before-makeup faces with nude makeup or very light makeup and another one containing faces with a large variety of facial makeup styles. To this end, we collect our own datasets from major websites. We manually identify whether each facial image is indeed a before-makeup or with-makeup face with eyes open and without occlusions. By this way, we harvest a before-makeup dataset of 526 images and a with-makeup dataset of 878 images. Our datasets contain a wide variety of facial makeup styles.
IV-B Efficient Makeup Transfer
Comparison results between us and Guo et al. [2] are shown in FIGURE 7. On the one hand, Guo et al. [2] method assume the illumination in the reference image is uniform, but it is not necessary to be the same as the input image. If any shadow or specularity exists, they would also be transferred to the input image. To solve this problem, we introduce illumination transfer to detect and remove shadow or specularity; our results are shown in FIGURE 7.
On the other hand, Guo et al. [2] method does not work well for black and dark makeup. In their result, the dark regions appears gray and unnatural. The black color is the foundation in physical makeup; but their method only transfers the detail introduced by foundation. The black color is interpreted as no color in CIELAB color space; the illumination of black color is especially important to human perception. But the illumination is not transferred in their method. Thus, the dark color in their result appears gray. We solve the problem through the way that adding illumination transfer with user control coefficient in the degree of illumination transfer, and our results are shown in FIGURE 7.
IV-C Effective Makeup Transfer
Comparison results between us and Chang et al. [5], as shown in FIGURE 8. As we have compared in the above results, the method of the Chang et al. [5] could only transfer the makeup of the eyes and lips, whereas could not transfer the makeup of the skin part, but our method is not only transfer the eyes and lips makeup, but also transfer the skin part makeup, which is equivalent to a combination of both. Except that, their makeup method for fine hair could not be effectively treated, but our method could overcome it.
Other comparison results between us and Liu et al. [4]. As shown in FIGURE 7, our makeup result works better than Liu et al. [4] method. As we could see, our method could transfer facial skin detail of the reference image, thus conduct to form new detail, while Liu et al [4] method could not do that. Furthermore, our method that combine makeup and relighting could handle the reference image with eye black and dark makeup rather than Liu et al. [4].
Last but not least important, the time and space complexity of our method is lower than Liu et al. [4].As shown in TABLE II, the running time for beautify makeup is within 2 seconds on an iPhone6 for a pair of color image with our method. For Liu et al. IJCAI 2016 [4], it needs to take 6 seconds on a TITAN X GPU for a pair of color image.
IV-D Air-Bangs Makeup Transfer
So far, there is still no good way to deal with makeup transfer with reference examples of air-bangs in the traditional computer vision fields and deep learning fields. Since these methods rely on extremely accurate facial feature landmark without any exception, so as to generate a natural facial mask. As for the reference examples in real life are very diverse, these methods could not make hair and skin very naturally segregate, resulting in the problem that the hair of the reference examples is also transferred together. In order to solve such a tough circumstance, we have further improved our method above, successfully solving the problem of hair and skin boundary in makeup transfer, as shown in FIGURE 9.
The main process as follow, firstly we conduct facial whitening and smoothing and use facial parsing of Liu et al. [6] to acquire the hard mask of the input image with air-bangs, then we utilize the previous method to generate the initial makeup, in which process we could notice that the hair makeup of the reference image also transfer to the input image unexpected. Followed, we need to convert the hard mask into soft mask which could judge the facial components in terms of probability. Combined the soft mask of the input image, we could make the input image preserve four facial components: eyes, mouth, air-bangs, and the background parts. At the same time, we make the initial makeup preserve four facial components: skin, eyebrows, nose, and lips parts. Thirdly, we fuse the pixels of the input image’s facial retention component and the initial makeup result’s facial retention component with different probabilities. Finally, we combine the above fusion results to generate the final makeup.
IV-E Quantitative Comparison
The quantitative comparison mainly focuses on the quality of makeup transfer and the degree of harmony. On the one hand, we conduct 100 makeup transfer experiments and compare our results with Guo et al. [2], Neural Style [13], and Liu et al. [4]. Each time, a 7-tuple, i.e., a input facial mages, a reference facial image, the result facial images by our method and above methods, are sent to 20 participants to compare. Note that the four result facial images are shown in random order. The participants rate the results into five degrees:“much better”, “better”, “same”, “worse”, and “much worse”. The percentages of each degree are shown in TABLE III. Our method is much better than Guo in 23.6% cases. We are much better than NerualStyle-CC and NerualStyle-CS in 90.1% and 92.3% cases. And We are much better than Liu in 32.9% cases.
On the other hand, we conduct a user study on Amazon Mechanical Turk making a pairwise comparison among results of the method of Chang et al.[5] and of our method. We randomly select 102 input facial mages and reference facial image, so we have 102 groups of makeup transfer results to compare. Then we ask 10 or more subjects to select which result better matches the makeup style in the reference. On average 87.3% of people prefer our results over those of Chang et al..
V Conclusion
In this paper, we propose a novel makeup transfer method that adapts to most of sample images. The main innovations are as follows: firstly, in the makeup transfer process, we conduct the illumination transfer in the facial structure with our special algorithm; secondly, we expand the makeup to air-bangs circumstances. The major advantages of our method are efficient, effective, and could handle the reference image with air-bangs.
Since the reference images only require skin detail and color information to beautify the appearance, the facial structure of input image is no longer needed, helping to protect the privacy of the makeup actor. We apply the latest and most fashionable makeup examples to our system so that users could apply virtual makeup to their faces in real time according to individual needs, just like a tailor-made personal beauty salon.
As we dilate above, our approach has the following three advantages:
- (1)
Black or dark and white makeup could be effectively transferred by introducing illumination transfer; 2. (2)
Efficiently transfer makeup within seconds compared to those makeup methods based on deep learning framework; 3. (3)
Examples with the air-bangs could makeup transfer perfectly.
VI Acknowledgements
We thank all the editors. reviewers and Prof. Yebin Liu for his advices. This work is partially supported by the National Natural Science Foundation of China (Grant Nos. 61772047, 61772513), the Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No. VRLAB2019C03), the Open Funds of CETC Big Data Research Institute Co.,Ltd., (Grant No. W-2018022), the Science and Technology Project of the State Archives Administrator (Grant No. 2015-B-10), and the Fundamental Research Funds for the Central Universities (Grant Nos. 328201803, 328201801). Parts of this paper have previously appeared in our previous work. This is the extended journal version of the conference paper: X. Li, R. Han, N. Ning, X. Zhang and X. Jin. Efficient and Effective Face Makeup Transfer. The 4th International Symposium on Artificial Intelligence and Robotics (ISAIR), Daegu, Korea, 20-24 August, 2019.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Wai-Shun Tong, Chi-Keung Tang, Michael S. Brown, and Ying-Qing Xu. reference-based cosmetic transfer. In Proceedings of the Pacific Conference on Computer Graphics and Applications, Pacific Graphics 2007, Maui, Hawaii, USA, October 29 - November 2, 2007 , pages 211–218, 2007.
- 2[2] Dong Guo and Terence Sim. Digital facial makeup by reference. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA , pages 73–79, 2009.
- 3[3] Chen Li, Kun Zhou, and Stephen Lin. Simulating makeup through physics-based manipulation of intrinsic image layers. In IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12, 2015 , pages 4621–4629, 2015.
- 4[4] Si Liu, Xinyu Ou, Ruihe Qian, Wei Wang, and Xiaochun Cao. Makeup like a superstar: Deep localized makeup transfer network. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence , IJCAI’16, pages 2568–2575. AAAI Press, 2016.
- 5[5] Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. Paired Cycle GAN: Asymmetric style transfer for applying and removing makeup. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 , pages 40–48, 2018.
- 6[6] Sifei Liu, Jimei Yang, Chang Huang, and Ming-Hsuan Yang. Multi-objective convolutional learning for facial labeling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 , pages 3451–3459, 2015.
- 7[7] Rastislav Lukac, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos. Color image processing. Computer Vision and Image Understanding , 107(1-2):1–2, 2007.
- 8[8] Alan Woodland and Frédéric Labrosse. On the Separation of Luminance from Colour in Images. In Mike Chantler, editor, Vision, Video, and Graphics (2005) . The Eurographics Association, 2005.
