Iris Verification with Convolutional Neural Network and Unit-Circle Layer
Radim Spetlik, Ivan Razumenic

TL;DR
This paper introduces a new CNN with a Unit-Circle Layer for iris verification, achieving state-of-the-art accuracy and significant performance improvements over existing methods on multiple datasets.
Contribution
The paper presents a novel CNN architecture with a unique Unit-Circle Layer that replaces Gabor-filtering, enhancing iris verification accuracy.
Findings
Achieved 10% higher accuracy than the previous best on CASIA.v4.
The Unit-Circle Layer improves performance by up to 15% on unseen data.
Validated on three public datasets with state-of-the-art results.
Abstract
We propose a novel convolutional neural network to verify a~match between two normalized images of the human iris. The network is trained end-to-end and validated on three publicly available datasets yielding state-of-the-art results against four baseline methods. The network performs better by a 10% margin to the state-of-the-art method on the CASIA.v4 dataset. In the network, we use a novel Unit-Circle Layer layer which replaces the Gabor-filtering step in a common iris-verification pipeline. We show that the layer improves the performance of the model up to 15% on previously-unseen data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Iris Verification with Convolutional Neural Network and Unit-Circle Layer
Radim Špetlík
Department of Cybernetics
Czech Technical University
Prague, CZ 120 00
&Ivan Razumenić
Microsoft Development Center Serbia
Belgrade, 11070 Serbia
[email protected] Work performed during an internship at Microsoft Development Center Serbia d.o.o.
Abstract
We propose a novel convolutional neural network to verify a match between two images of the human iris. The network is trained end-to-end and validated on three publicly available datasets yielding state-of-the-art results against four baseline methods. The network performs better by a margin to the state-of-the-art method on the CASIA.v4 dataset. In the network, we use a novel “Unit-Circle” layer which replaces the Gabor-filtering step in a common iris-verification pipeline. We show that the layer improves the performance of the model up to on previously-unseen data.
1 Introduction
Iris verification is a biometric technique used for human identification. Given a pair of images of human irises, the task is to decide whether the irises match. Iris verification is applied widely, e.g., in border control, citizen authentication, or in forensics [21].
Common iris verification pipeline has three steps – iris detection, feature extraction, and matching (see Fig. 2, interested reader is referred, e.g., to [5]). First, an iris is found and normalized. Second, the normalized iris is typically convolved with Gabor filters and converted into a “bitcode”, i.e. a matrix of binary numbers. Third, two bitcodes are compared. The bitcodes match if their Hamming distance is smaller than a given threshold.
Feature extraction and matching are highly data-dependent in a common iris verification pipeline and therefore require parameter-tuning. Since the task is not convex, an exhaustive search for parameters is performed. In this paper, we propose a method which replaces the feature extraction and matching part of the iris verification pipeline with a single fully convolutional neural network and a single learning rule – the backward propagation of errors or backpropagation. The network is trained end-to-end using the binary cross-entropy loss function. The input of the network is a pair of normalized irises, the output is a single number which is interpreted as a posterior probability of a match (see Fig. 1).
So far, convolutional neural networks were used in iris verification for better feature encoding. To encode the features, standard blocks of convolutions, max-pooling, and batch normalization layers were used. We introduce a novel “Unit-Circle layer” that replaces the feature extraction step in a common iris verification pipeline and is learned optimally by backpropagation.
The contributions of this paper are the following: (i) we propose a novel method of iris verification that replaces feature extraction and matching steps of a commonly used iris verification pipeline. We replace it with a single convolutional neural network (IrisMatch-CNN) trained end-to-end that is robust to changes in the iris image acquisition setup, (ii) as opposed to the metric-learning iris verification, we compare two images of irises directly and learn the network with the binary cross-entropy loss, (iii) we evaluate the method on three public datasets against four methods achieving state-of-the-art results.
2 Related work
To the best of our knowledge, there is only one work [8] in which a convolutional neural network (CNN) extracts the features and performs the iris verification at the same time. However, the method is designed to verify a match only between a pair of heterogeneous irises, i.e. irises from different sources.
Commonly, researchers in the iris verification domain use a CNN to better encode the iris features. In the following text, we present the methods that use a CNN at some point in the iris verification pipeline.
CNN as the feature extractor
We start with the methods that use a CNN as the feature extraction tool.
[11] use a pre-trained CNN network to produce a feature vector used for verification. The verification is performed with a support vector machine.
In [19], a deep CNN generates a compact representation of iris and periocular regions. The input of the network is a normalized iris image, the output is a -dimensional feature vector. Cosine similarity, norm, norm, and covariance measures are used to match two feature vectors.
In our experiments, we follow the methodology presented in [21]. The authors propose a deep learning framework composed of a CNN that generates iris descriptors and a sub-network that provides a mask identifying iris regions meaningful for matching. The network is trained using a specially designed Extended Triplet Loss that incorporates bit-shifting and non-iris masking. The input of the network is a normalized iris image. The output is a feature map that is, together with the mask, used to perform the matching. Matching is done by computing the Hamming distance of two binarized feature maps, taking into account their masks. Experiments on four publicly available databases are presented in which the introduced method outperforms four iris recognition approaches.
Another approach that uses CNN to decode features is presented in [17]. The learning of the network is formulated as a classification problem. The input is a normalized iris image and output is a -dimensional softmax layer where each class corresponds to a set of irises of a particular person. After the training is finished, the fifth convolutional layer is used as a feature map. To improve robustness, custom ordinal measure is computed that produces a binary vector which is used to perform the matching.
[6] present a deep CNN that encodes the iris into a -dimensional feature vector. The learning is stated as a classification problem where each class corresponds to a set of irises of a single person. After the training, the output of the second last fully connected layer is used to compute a similarity score – the Euclidean distance. The input of the network is a gray-scale iris image normalized to polar coordinates.
CNN used differently than the feature extractor
We follow with three works that use the CNN in another way than just an extractor of the features.
Authors of [8] design a CNN to verify the relationship between two heterogeneous iris images. A “pairwise filter” layer is introduced to extract features from a pair of normalized irises from different sources. For a single pair of irises, six input pairs are generated, explicitly encoding iris rotations and ordering of pairs. The output of the network is a similarity score – if the two normalized irises belong to the same identity, [math] otherwise. Experiments only for irises from heterogeneous sources are presented. It is not clear which loss function is used.
A CNN in [14] distinguishes between corresponding / noncorresponding patches on a normalized iris image. The output of the network is a single scalar – a probability that the patches correspond. The output of the CNN serves as an input to a Markov random field used to infer a deformation model between a pair of iris images. Given the deformation parameters, the histogram of magnitudes and phase angles are computed. Classification is done with a binary classifier.
A pre-trained ResNet18 in [10] verifies if two irises match. Despite significant efforts, we were not able to fully comprehend the details of the method. To be more specific: (a) it is not clear what are the inputs of the network, (b) it is not clear why the output of the modified ResNet18 is fed into two perceptrons – one for the positive class, the other for negative – and not to a single perceptron, (c) it is not clear, how the outputs of the two perceptrons are used in the final decision.
3 Method
We propose a convolutional neural network (CNN) to verify a match of two normalized iris images (see Fig. 3). The input of the network is a pair of normalized iris images. The output is a single scalar interpreted as the posterior probability of the match.
The verification has two parts. First, the features are extracted with a novel “Unit-Circle layer”. Second, the features are concatenated and fed into the “Matcher” – a fully convolutional neural network which outputs a single scalar, the probability of the match. When used together, the Unit-Circle layer and Matcher CNN creates a network architecture to which we refer as to the IrisMatch-CNN.
Let be the training set that contains tuples of normalized-iris images . Each tuple contains images of the same iris. Symbol denotes the set of all input iris images.
3.0.1 Unit-Circle layer
Let be the output of a standard convolutional layer with a single input channel and two output channels for the -th normalized iris image, where is a concatenation of the parameters of the filter. We define the output of the Unit-Circle layer on the -th row and -th column as
[TABLE]
In other words, the output of the Unit-Circle layer (U-C layer) is the output of a standard convolutional layer that is normalized along the output channel dimension – the convolutional layer must have one input channel and two output channels. After the normalization, each pixel in the two-dimensional output of the U-C lies on the unit cirle.
When multiple U-C layers are used, we define the concatenation of their responses , where is the number of U-C layers, is the response of the -th U-C filter, is a concatenation of all parameters of the filters. In IrisMatch-CNN, five U-C layers are used, i.e. we get five pairs of responses or output channels for each normalized image of iris.
We follow a custom padding strategy. In the vertical direction, we pad by zeroes. Since the normalized image is stored in the polar coordinates, in the horizontal direction and left side of the image we: (i) compute – integer part of half the width of the filter, (ii) create a copy of the normalized iris by selecting pixels from the right side of the image, (iii) append the copy to the left side of the image. We repeat for the right side.
3.0.2 Matcher
The Matcher is a fully convolutional neural network that produces a single scalar – the probability that two irises match. Let be the output of the Matcher CNN for the pair of -th and -th normalized-iris images and a concatenation of all convolutional filter parameters. The input of the Matcher CNN is , where is the output of all U-C layers for the normalized iris and is a concatenation of the parameters of filters of all U-C layers.
In other words, the input of the network is created as follows. A normalized iris is fed into the U-C layers. The output of the U-C layers is concatenated. The same procedure is repeated for the second normalized iris. Finally, the two sets of responses are concatenated creating the input of the Matcher network. Note that the normalized-iris images are fed through the same set of U-C layers.
3.0.3 Learning
The binary cross-entropy is used as the objective function. If two irises match, the desired prediction is , otherwise. In all experiments, the Matcher network was trained first - the weights of the U-C layers were initialized randomly and fixed. After approx. epochs, we started training the weights of the whole IrisMatch-CNN. We applied this scheme to speed up the training – if the whole network was trained from the beginning, the network converged approx. times slower or did not converge at all. The training data are heavily imbalanced towards the negative class (up to ratio). We manually balanced the classes by randomly selecting the negative examples, where is the number of positive examples. We repeated the sub-sampling of the negative class in each epoch.
Technical details
In the Matcher CNN, standard blocks of convolutions and Exponential Linear Unit [4] activation functions were used. Also, batch normalization and dropout was applied. We trained the network in the PyTorch 1.0 library with the Adam optimizer and the learning rate set to . The set of all input normalized-iris images . The output of all U-C layers for the normalized-iris image . Stride and padding are the same in all five layers.
The training set was split to the training and validation subset with the ratio of . IrisMatch-CNN has parameters in total (compare with approx. of [10]).
4 Experiments
The quality of iris detection and segmentation has a dramatic effect on the performance of iris recognition pipeline [7]. Since different authors use different iris segmentation methods, reproducibility of the results reported in iris-verification papers is usually low. Therefore, we follow the methodology of [21] – the authors made their codes public along with the segmentations.
In the experiments, we evaluate the methods with the True Accept Rate (TAR) for a given False Accept Rate (FAR). FAR is a fraction of non-matching pairs classified as matches, TAR is a fraction of matching pairs of iris images classified as non-matches.
4.0.1 Datasets
As discussed earlier in this section, we follow the evaluation procedure introduced in [21]. However, we were not able to retrieve the “WVU Non-ideal Iris Database - Release 1” since it is currently available only to the residents of the United States. Therefore, we present the results on three datasets – ND-IRIS-0405, CASIA v4, and IITD. See Fig. 4 for the number of samples in the training and testing subsets and for sample images. In case of all datasets, the iris segmentations provided in the scripts of [21] were used to extract the iris in the testing sets. For the training sets, the irises were segmented with a method introduced in [20]. The models with the highest GAR on the validation subset were selected for the evaluation on the test set.
ND-IRIS-0405
The ND-IRIS-0405 Iris Image Dataset (ICE 2006) [3] contains iris samples from subjects. The training set for this database was composed of from the left eye images from all subjects and the test set from the first 10 right eye images from all subjects.
CASIA Iris Image Database V4 - distance
The “distance” subset of the CASIA dataset [1] contains samples from subjects. The “distance” subset is composed of images of the upper part of a face – each image contains both eyes. The authors of [21] provide segmentation of eyes for the subjects in the testing set. For the training set, the eyes were localized with the IntraFace facial landmarks detector [18] using the facial landmarks near the eyes. The training set contains only the right eye images, the test set includes only the left eye images.
IITD Iris Database
The IITD database [2] includes image samples from subjects. There are only the right eye images in the training set. In the test set, only the first five left eye images were used.
4.1 Comparative study
In this experiment, a comparative study on three public datasets against four state-of-the-art methods is presented.
First, we shortly describe the methods. The most widely deployed iris feature descriptor is the Gabor-filter-based IrisCode [5]. It is a highly competitive method suitable for a performance benchmark [21]. A popular public implementation of IrisCode is an open source tool for iris recognition OSIRIS v4.1 [12]. It uses a band of tunable 2D Gabor filters encoding the iris features at different scales. Another IrisCode method [9] uses 1D log-Gabor filter(s) to extract the features. Ordinal is an approach checking the consistency of Ordinal measures in irises [15].
The ICCV17 and IrisMatch-CNN methods are presented in two configurations. “CrossDB“ means that the model was trained only on the training set of the ND-IRIS-0405 database and “WithinDB” means that the model was also fine-tuned on the training set of the target database.
We present the results reported in [21] – the results were reproduced with the scripts provided by the authors. However, despite significant efforts, we were not able to reproduce the results in case of ND-IRIS-0405 database. We therefore exclude the ICCV17 method from the comparison in case of this database. Note that in all experiments presented in [21], the vertical resolution of the normalized iris image is pixels. The IrisMatch-CNN was developed with the input vertical resolution of pixels. Therefore, we resize to the required vertical resolution using the linear interpolation. All methods were extensively tuned on the target databases to ensure a fair comparison – the details are provided in [21].
Taking look at Fig. 5, IrisMatch-CNN yields the best results in case of all three databases. We see that, compared to the other methods, the performance of IrisMatch-CNN shows a different trend – TAR tends to be higher for a wider range of FAR, which is especially visible in case of the ND-IRIS-0405 database. We believe, that this tendency is caused by the binary cross-entropy objective function.
Let us examine both “CrossDB” and “WithinDB” setup in Fig. 5 now. The results of IrisMatch-CNN on the IITD database do not differ much between these two settings. A difference of approx. per cent in favour of “WithinDB” setup is visible in case of the CASIA database. We conclude that IrisMatch-CNN generalizes well between different databases, i.e. the method is robust to changes in the iris image acquisition setup.
4.2 Effect of Unit-Circle layers on performance
We developed the Unit-Circle (U-C) layer as a replacement of the Gabor filtering step in the iris recognition pipeline. We interpret the outputs of the U-C layer as responses lying on the unit circle in a two-dimensional plane. In the following experiment, we replaced the normalization in the U-C layer by two non-linearities – by the Rectified Linear Unit, or ReLU, and the Exponential Linear Unit [4], or ELU. We trained the IrisMatch-CNN network with each non-linearity on the training set of the ND-IRIS-0405 database for epochs, selecting the model with the best validation performance.
As seen in Fig. 6 in case of all three databases, the TAR is higher for the network in which the U-C layers are used. From the plots (ii) and (iii), we conclude that the U-C layers improve generalization on unseen data.
4.3 Effect of iris masks on performance
The input of IrisMatch-CNN is a pair of normalized irises. In this experiment, we also included the masks estimated by the ICCV17 mask sub-network so that the input of the IrisMatch-CNN network is a pair of normalized irises with their masks. The normalized-iris mask is shown in Fig. 2. In a common iris-verification pipeline, it marks the areas not suitable for matching – e.g., eyelids, sclera, or reflections.
The results are shown in Fig. 6. There is no significant improvement when also the masks are included. Therefore, we conclude that the IrisMatch-CNN network is capable of determining the “good areas to match” by itself.
4.4 Effect of iris-segmentation method on IrisMatch-CNN performance
In this experiment, the task was to examine the robustness of IrisMatch-CNN against the change of the iris segmentation method. In our datasets, we segment the iris with the total variation method [20]. In this experiment, we employed a publicly available iris-verification software OSIRIS v4.1 that uses the Viterbi method [16]. We created two modified test subsets from the ND-IRIS-0405 and CASIA.v4 databases (see Fig. 7(b) for statistics). We followed the left/right eye splits in the testing datasets. However, we used all images that were successfully extracted by both the methods. The presented results were retrieved by the IrisMatch-CNN network that was trained on the standard training subset of the ND-IRIS-0405 database.
The first thing that needs a comment is a much higher performance visible in Fig. 7(a) in case of ND-IRIS-0405. The test set used in other experiments contains only the first right eye images from all subjects. In this experiment, we did not follow this limitation. Instead, we used all iris images that were successfully segmented by both the methods. This condition resulted in an “easy-to-verify” set of normalized irises. However, the first conclusion of this experiment – for the database on which the IrisMatch-CNN was trained the switch between the total variation method and the Viterbi method makes no difference in performance.
However, the results on the CASIA.v4-distance database give us a different view. The total variation method gives better TAR by for a FAR than the Viterbi method. We conclude, that the IrisMatch-CNN method is not robust to changing the capture settings (i.e. the database) and the segmentation method at the same time.
Note that we excluded the IITD database from this experiment since the number of authentic pairs in the testing set, which we got by the previously described construction, was less than .
4.5 Heterogeneous iris verification
In this experiment, we inspect the performance of the IrisMatch-CNN model in the heterogeneous iris verification. In this type of verification, two images of irises are compared, but each iris is from a different source (as opposed to the previous experiments, in which the pair of irises was always from the same source).
For the purpose of this experiment, we use the ND-CrossSensor2013 database111Available at https://cvrl.nd.edu.. In this database, each iris is captured with both the LG2200 and LG4000 iris sensors. We follow the experimental protocols introduced by the authors of the database. More specifically, we use the “SigSets2013-Small-LG4000-LG2200” protocol that specifies which irises should be compared. In this protocol, there is a total number of comparisons. However, we segmented the normalized irises with the total variation method [20] and we were not able to segment the whole dataset. Therefore, in our experiment, there is a total number of comparisons (see Fig. 8(c) for numbers of positive class, or authentic, and negative class, or imposter, pairs).
Let us take a look at the results in Fig. 8(b). In the first experiment, we used the model trained on the ND-IRIS-0405 database training subset (see Fig. 5 for results on other datasets). Then, we tried to fine-tune the model on a training subset of the ND-CrossSensor2013 database. Compared to the results reported by the ACSTL Cross-Sensor Comparison Competition Team 2013 [13], our models perform more than worse at the false accept rate of . We believe that this is caused by the architecture of the IrisMatch-CNN – we use the same set of Unit-Circle layers for both input images of irises. In fact, our experiment verifies results of [8]. The authors design a special “pairwise bank filter” to account for differences between the heterogeneous irises, i.e. irises from different sources. In our case, the source of the difference between the irises is the capturing device. Compared to the LG4000-based sensors, the LG2200-based sensors produce blurry images commonly with a strong interlacing (see Fig. 8(a)). We conclude, that IrisMatch-CNN is not suitable for heterogeneous iris verification.
5 Conclusion
In this paper, we introduced a novel convolutional neural network architecture IrisMatch-CNN yielding state-of-the-art results in the iris-verification task on the ND-IRIS-0405, CASIA.v4-distance, and IITD databases. The input of the network is a normalized-iris image, the output is a single scalar interpreted as the probability of a match. A novel Unit-Circle layer was introduced that improves robustness of the model (i.e. the ability of the model to generalize on previously-unseen data), which is verified in experiments. We presented experiments in which a different iris-segmentation method: (a) does not affect the performance when evaluated on previously-seen data (b) decreases the performance otherwise. Lastly, we showed that IrisMatch-CNN is not suitable for heterogeneous iris verification, i.e. for matching two irises when each is from a different source.
The input of the IrisMatch-CNN model is a normalized-iris image. Therefore, the performance of the model heavily depends on the iris detection and segmentation methods. To the best of our knowledge, there is no work in which the detection, segmentation, and matching is performed end-to-end. We believe that the next steps in the iris verification domain will incorporate the detection and segmentation into a customized neural-network architecture that will yield excellent results and will be compact at the same time.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] CASIA.v 4 Iris Database, http://www.cbsr.ia.ac.cn/china/Iris%20Databases%20CH.asp
- 2[2] IIT Delhi Iris Database, http://www 4.comp.polyu.edu.hk/~csajaykr/IITD/Database_Iris.htm
- 3[3] Bowyer, K.W., Flynn, P.J.: The ND-IRIS-0405 iris image dataset. Tech. rep., Notre Dame CVRL
- 4[4] Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and Accurate Deep Network Learning by Exponential Linear Units (EL Us). ar Xiv:1511.07289 [cs] (Nov 2015), http://arxiv.org/abs/1511.07289 , ar Xiv: 1511.07289
- 5[5] Daugman, J.: How iris recognition works. IEEE Transactions on Circuits and Systems for Video Technology 14(1), 21–30 (Jan 2004)
- 6[6] Gangwar, A., Joshi, A.: Deep Iris Net: Deep iris representation with applications in iris recognition and cross-sensor iris recognition. In: 2016 IEEE International Conference on Image Processing (ICIP). pp. 2301–2305 (Sep 2016)
- 7[7] Li, Y.H., Huang, P.J., Juan, Y.: An Efficient and Robust Iris Segmentation Algorithm Using Deep Learning (2019), https://www.hindawi.com/journals/misy/2019/4568929/
- 8[8] Liu, N., Zhang, M., Li, H., Sun, Z., Tan, T.: Deep Iris: Learning pairwise filter bank for heterogeneous iris verification. Pattern Recognition Letters 82, 154–161 (Oct 2016), http://www.sciencedirect.com/science/article/pii/S 016786551500327 X
