Effect of Super Resolution on High Dimensional Features for Unsupervised   Face Recognition in the Wild

Ahmed ElSayed; Ausif Mahmood; Tarek Sobh

arXiv:1704.01464·cs.CV·November 14, 2018

Effect of Super Resolution on High Dimensional Features for Unsupervised Face Recognition in the Wild

Ahmed ElSayed, Ausif Mahmood, Tarek Sobh

PDF

TL;DR

This paper investigates how a state-of-the-art super resolution algorithm enhances high-dimensional features in unsupervised face recognition from low-resolution, in-the-wild images, showing significant improvements in recognition accuracy.

Contribution

It demonstrates the positive impact of super resolution on high-dimensional feature-based unsupervised face recognition in uncontrolled environments.

Findings

01

Super resolution improves recognition rates significantly.

02

Enhanced images lead to better feature extraction.

03

Unsupervised algorithms benefit from super resolution enhancements.

Abstract

Majority of the face recognition algorithms use query faces captured from uncontrolled, in the wild, environment. Often caused by the cameras limited capabilities, it is common for these captured facial images to be blurred or low resolution. Super resolution algorithms are therefore crucial in improving the resolution of such images especially when the image size is small requiring enlargement. This paper aims to demonstrate the effect of one of the state-of-the-art algorithms in the field of image super resolution. To demonstrate the functionality of the algorithm, various before and after 3D face alignment cases are provided using the images from the Labeled Faces in the Wild (lfw). Resulting images are subject to testing on a closed set face recognition protocol using unsupervised algorithms with high dimension extracted features. The inclusion of super resolution algorithm resulted…

Equations2

χ^{2} (X, Y) = i,, j \sum \frac{( x _{i, j} - y _{i, j} ) ^{2}}{x _{i, j} + y _{i, j}},

χ^{2} (X, Y) = i,, j \sum \frac{( x _{i, j} - y _{i, j} ) ^{2}}{x _{i, j} + y _{i, j}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

effect of Super Resolution on High dimensional features for Unsupervised face recognition in the wild

Abstract

Majority of the face recognition algorithms use query faces captured from uncontrolled, in the wild, environment. Often caused by the cameras’ limited capabilities, it is common for these captured facial images to be blurred or low resolution. Super resolution algorithms are therefore crucial in improving the resolution of such images especially when the image size is small requiring enlargement. This paper aims to demonstrate the effect of one of the state-of-the-art algorithms in the field of image super resolution. To demonstrate the functionality of the algorithm, various before and after 3D face alignment cases are provided using the images from the Labeled Faces in the Wild (lfw). Resulting images are subject to testing on a closed set face recognition protocol using unsupervised algorithms with high dimension extracted features. The inclusion of super resolution algorithm resulted in significant improved recognition rate over recently reported results obtained from unsupervised algorithms.

**Index Terms— ** Super-Resolution, high dimensions features, unsupervised learning, face recognition, label faces in the wild (lfw)

1 Introduction

Majority of the surveillance cameras are installed outdoors and therefore, the captured images are likely to be impacted by the surrounded environment. These images are called “images in the wild” and when used for face recognition, their size and resolution affect the accuracy of facial recognition. Current literature offers limited studies focusing on this problem. Existing studies[1, 2, 3, 4] mostly focus on a video or a multi-frame based super resolution construction of the low resolution face images. In these, the authors focus on performance of traditional face recognition techniques on lower and super resolution faces constructed from multi-frame videos. In real world applications however, the problem at hand often has a single query image and not multi-frame video.

Other relevant studies [5, 6] utilize single image super-resolution algorithms to study the performance of face recognition algorithms on varying face resolutions. However, these studies did not investigate the performance of face recognition using high dimension features. Furthermore, both studies utilized test datasets which include images captures in controlled environments.

This research studies the performance of unsupervised face recognition for labeled faces in the wild (lfw) dataset [7, 8] using a single image super-resolution algorithm. The effect of the algorithm on high dimensional features used in the face recognition process is investigated. Each image in the dataset is 3D aligned and frontalized using face frontalization algorithm as proposed in [9].

The main contribution of this paper is:

•

Applying Local Binary Pattern (LBP) and Multi-Scale LBP features on captured faces in the wild and using calculated features in unsupervised closed set face recognition.

•

Studying the effect of single image super-resolution algorithm vs bicubic scaling on unsupervised face recognition in the wild.

•

Examining the order of applying face frontalization and image sharpness (super-resolution) process.

Following sections include details of the super-resolution algorithm and the discussion regarding the LBP high dimension features. After the comparative analysis a description of the proposed experiment and the techniques utilized are provided. This is followed by the explanation of the algorithm results. Lastly, conclusions and discussions are given in the final section.

2 Single Image Super-Resolution

Super-Resolution algorithm is used to enhance image resolution and to provide additional details of the input image. In this work, a super-resolution image algorithm based on Convolutional Neural Network (CNN) is used as also described in [10]. The system first generates low resolution higher dimension image from the input image using bicubic interpolation. This image is then applied to a CNN network structure as shown in Figure 1 to improve the image peak signal to noise ratio (PSNR) for generating a higher resolution image that should be close to the original image in quality. The utilization of CNN makes the proposed algorithm superior to other similar SR techniques that generate mapping from low to high resolution images due to its simplicity and the resulting higher PSNR compared to other approaches.

3 High Dimensional Features

Unsupervised face recognition found a recent interest due to the capability of handling unlabeled faces, especially in closed datasets as in [11, 12, 13]. The research on high dimensional features has provided remarkable results in face recognition and verification, particularly with supervised learning as in [14, 15]. These features however have not been sufficiently explored using unsupervised techniques. This section demonstrates the utilization of one of those features using unsupervised metric for closed set protocol on the lfw dataset.

In [11] LBP features have provided remarkable unsupervised face recognition outcomes for faces in controlled environment. Therefore, the same Chi square metric given in equation 1 is used in the testing of the extracted features from the lfw dataset.

[TABLE]

where, $X$ and $Y$ are the histograms to be compared, $i$ and $j$ are the indices of the $i$ -th bin in histogram corresponding to the $j$ -th local region.

In this test, three types of LBP features are demonstrated. The first one is the regular uniform LBP features extracted from frontalized faces by dividing the 90x90 face into 10x10 blocks, each being 9x9 pixels. Following this (8,2) ( $LBP_{8,2}^{u2}$ ) neighborhoods are calculated for each block as in [11], The histograms of all blocks are then concatenated together to form a single vector representation for the face image to be used in equation 1. The output vector of this calculation will be 5900 in length.

The second type of LBP is a Multi-Scale representation. The frontalized face is scaled down 5 times, and for each scale the image is divided to 10x10 blocks 9x9 pixels each as shown in Figure 2 a. The $LBP_{8,2}^{u2}$ histogram is then calculated again for each block at each scale and all histograms are concatenated together to form a vector representation for the face with a length of 12980.

The final LBP type is the HighDimLBP introduced in [14], where the faces are not frontalized but instead an accurate landmarks detection technique is used to obtain facial landmarks. Then for each landmark in the 300x300 image a grid of 40x40 centered at each landmark point is constructed and $LBP_{8,2}^{u2}$ is calculated over each 10x10 pixels block as shown in Figure 2 b. Following this, all histograms from all blocks for all landmark points on the 5 different scales are concatenated together to form a vector representation of the face image. The length of this vector for one image is 127440which is significantly long and computationally expensive. Therefore, in some cases, the size is reduced to 400 using the principle component analysis (PCA) to improve the computational performance. Similar approach has also been used used in [15, 14].

A comparison is made between these three types to obtain the best technique in the proposed experiments. The next section details the experiment results.

4 EXPERIMENT DESCRIPTION

This paper proposes two experiments to examine the effect of image super-resolution and the order of applying it with frontalization to unsupervised face recognition process based on the features described in section 3. These two experiments are detailed in the following:

Apply face frontalization first. The work flow of this experiment is depicted in Figure 3 a, and can described as in the following:

(a)

Detect and frontalize face from the original sized image (250x250 in this case). 2. (b)

Scale down the face image by scale of 3, To assuming the case of detecting face at that size (if the frontalized face image size is 90x90 the resulting image size will be 30x30, an appropriate size for face detection techniques111Minimum detection size of Haar Cascade classifier for face detection is 24x24). 3. (c)

Scale up the face image again by scale of 3 using bicubic technique. 4. (d)

Apply the SRCNN algorithm to the scaled face to generate a super-resolution version. 5. (e)

Extract uniform local binary pattern (LBP) features from the SR-image by dividing it into 10x10 blocks and concatenating the histograms of all blocks together. This step is applied on both bicubic and super-resolution scaled faces to compare the performance of the recognition process. 6. (f)

For Multi-Scale LBP the face image is scaled down for five scales as shown in Figure 2 a. The histograms of all blocks and scales are concatenated together. This step will be reapplied on both bicubic and super-resolution scaled faces to compare the performance of the recognition process. 7. (g)

Calculate $\chi^{2}$ distances between the extracted features to obtain the minimum distances between the query images and the prob ones using equation 1. 2. 2.

Process face images prior to frontalization. The work flow of this experiment is as shown in Figure 3 b, and can described in the following steps::

(a)

Scale down the face image by scale of 3. 2. (b)

Scale up the face image again by scale of 3 using bicubic technique. 3. (c)

Apply the SRCNN algorithm to the scaled image to generate a super-resolution version. 4. (d)

Extract frontalized faces from both bicubic images and super-resolution ones for performance comparison. 5. (e)

Calculate features and distances as described in steps e to g in experiment 1.

5 RESULTS

The proposed comparison and experiments have been tested on the Labeled Faces in the Wild (lfw) dataset [7, 8] using closed set face recognition protocol proposed in [15]. In this protocol, 10 groups are extracted from the entire dataset, each group having two sets; gallery and genuine prob. Both the gallery and prob sets included images of 1000 different persons. Each gallery set contains 1000 images, one image per person, with the size of the prob set varying from one group to another with an average of 4500 images for the same 1000 persons in the gallery set. The recognition rates calculated in this paper represent the average recognition rates over the all 10 protocol groups.

In this work the faces are detected using Histograms of oriented Gradients (HoG) algorithm proposed in [16]using python. For each detected face, an algorithm for landmarks detection based on regression tree is then used for face landmarks detection as in [17] using python222Python wrapper for dlib and OpenCV libraries. Experiment 2 included some cases where the HoG based face detection algorithm failed to detected faces due to the effect of image scaling. Therefore, an alternative backup face detection algorithm which is based on Adaboost Haar Cascade [18, 19] is used in cases where no faces were detected in the image333Adaboost Haar Cascade classifier is known to have higher face detection rate but with large number of false positives [19, 18].

First, a comparison between the three different types of LBP features has been applied to this dataset and Chi square metric has been used as an unsupervised face recognition metric. As shown in figure 4 the Multi-Scale LBP features outperform other LBP types, especially the method of using HighDimLBP+PCA listed in [15]. However, as shown in table 1 both Multi-Scale LBP and HighDimLBP with Chi square distance have close recognition rates. It should also be noted that the computation time of Chi square distance for HighDimLBP is significantly high compared to other LBP types due to the length of the vector representation.

For the two experiments, the super-resolution based on convolutional neural network (SRCNN) algorithm is implemented using Caffe library and tested using Matlab. But, instead of applying SR algorithm on the y component only of the ycbcr domain (because it is the one with the high frequencies), in this test the SR algorithm is applied on the three channels of the RGB domain to enhance both the edges and colors of the estimated pixels by the bicubic scaling.

For the protocol used, the faces are first frontalized as in [9] and an unsupervised face recognition based on LBP and Multi-Scale LBP features is utilized to create a baseline for comparison. The results of proposed experiments are marked as lfw3D in the tables and figures. The results of experiment 1 of the bicubic scaling is marked as lfw3D bicubic 3 channels where as for the super-resolution version they are called lfw3D SR 3 channels. Experiment 2 results of bicubic scaling are marked as lfw bicubic 3 channels original cropped where as the super-resolution version is marked as lfw SR 3 channels original cropped.

As shown in figure 5, the super-resolution algorithm enhances the recognition rates for both LBP and Multi-Scale LBP features over bicubic scaled version in both experiments. However, both are still lower than the baseline recognition rate. Moreover, the recognition rate of experiment 1 is superior to the one collected from experiment 2. This is significant since it indicates that applying face frontalization prior to scaling and sharpening process provides better results than scaling all the images up and frontalizing the detected face. It can also be observed that Multi-Scale LBP performs better in both experiments and outperforms all other features used in the presented unsupervised test.

6 CONCLUSION

This work utilized an unsupervised face recognition with images from the Labeled Faces in the Wild (lfw) dataset with LBP and Multi-Scale LBP based extracted features. The results indicate that Multi-Scale LBP outperforms both LBP and HighDimLBP features with reasonable extraction and distance calculation time. Two experiments have also been introduced to measure the performance of applying single image super-resolution algorithm on faces captured in the wild and the effect of order of applying it with face frontalization algorithm. It can be concluded that applying super resolution on frontalized faces provides better results as opposed to applying super resolution first. This is because face frontalization uses interpolation to calculate some pixel values, similar to bicubic scaling, which will get enhanced with super-resolution techniques. The results also indicate that applying super-resolution on bicubic scaled faces shows slight enhancement in unsupervised face recognition process for both experiments with the two types of features.

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Frank Lin, Clinton Fookes, Vinod Chandran, and Sridha Sridharan, Super-Resolved Faces for Improved Face Recognition from Surveillance Video , pp. 1–10, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
2[2] F. W. Wheeler, X. Liu, and P. H. Tu, “Multi-frame super-resolution for face recognition,” in 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems , Sept 2007, pp. 1–6.
3[3] Shuowen Hu, Robert Maschal, S. Susan Young, Tsai Hong Hong, and P. Jonathon Phillips, “Face recognition performance with superresolution,” Appl. Opt. , vol. 51, no. 18, pp. 4250–4259, Jun 2012.
4[4] Yinghui Kong, Shaoming Zhang, and Peiyao Cheng, “Super-resolution reconstruction face recognition based on multi-level {FFD} registration,” Optik - International Journal for Light and Electron Optics , vol. 124, no. 24, pp. 6926 – 6931, 2013.
5[5] Clinton Fookes, Frank Lin, Vinod Chandran, and Sridha Sridharan, “Evaluation of image resolution and super-resolution on face recognition performance,” Journal of Visual Communication and Image Representation , vol. 23, no. 1, pp. 75 – 93, 2012.
6[6] Pejman Rasti, Tõnis Uiboupin, Sergio Escalera, and Gholamreza Anbarjafari, Convolutional Neural Network Super Resolution for Face Recognition in Surveillance Monitoring , pp. 175–184, Springer International Publishing, Cham, 2016.
7[7] Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” Tech. Rep. 07-49, University of Massachusetts, Amherst, October 2007.
8[8] Gary B. Huang Erik Learned-Miller, “Labeled faces in the wild: Updates and new reporting procedures,” Tech. Rep. UM-CS-2014-003, University of Massachusetts, Amherst, May 2014.