An approach to partial occlusion using deep metric learning

Received May 12, 2021 Revised Oct 27, 2021 Accepted Nov 10, 2021 The human face can be used as an identification and authentication tool in biometric systems. Face recognition in forensics is a challenging task due to the presence of partial occlusion features like wearing a hat, sunglasses, scarf, and beard. In forensics, criminal identification having partial occlusion features is the most difficult task to perform. In this paper, a combination of the histogram of gradients (HOG) with Euclidean distance is proposed. Deep metric learning is the process of measuring the similarity between the samples using optimal distance metrics for learning tasks. In the proposed system, a deep metric learning technique like HOG is used to generate a 128d real feature vector. Euclidean distance is then applied between the feature vectors and a tolerance threshold is set to decide whether it is a match or mismatch. Experiments are carried out on disguised faces in the wild (DFW) dataset collected from IIIT Delhi which consists of 1000 subjects in which 600 subjects were used for testing and the remaining 400 subjects were used for training purposes. The proposed system provides a recognition accuracy of 89.8% and it outperforms compared with other existing methods.


INTRODUCTION
Face is a unique part in each and every individual which performs passive identification in one-tomany environments. Occlusion refers to extraneous objects which hide the face e.g.: face partially covered with hat, sunglasses, scarf and beard [1]. The different types of occlusions occur in face is illustrated in Figure 1. The partial occluded features does not exist more in normal face recognition because the face is captured in a very good scenario with more lightning conditions [2]. But in forensic face recognition, face is captured in a worst scenario with poor lightning conditions. Presence of these partial occluded features in face makes the process of identification a difficult task which inturn affect the overall performance of the system. Thus, the presence of partial occluded features in face plays a significant role in forensics. If an individual has committed a crime wherein which the face is captured under worst scenario having partial occluded features, then it becomes a very difficult and challenging task to recognize the face of a suspect who involved in criminal activities.
As a result, face recognition having partial occluded features in the field of forensics is a more challenging task than normal face recognition [3]. The main idea behind this work is to recognize partially occluded faces using deep metric learning. Some of the challenges which can be addressed in the field of forensic face recognition are obstructions on face (partial occlusion), facial marks, face captured in surveillance camera captured at the time of crime. Hence in this paper, an attempt is made in this direction to address the problem of partial occlusion in forensic face recognition. The key contributions of this work are stated: -An approach based on the combination of histogram of gradients (HOG) with Euclidean distance is proposed -Proposed system is experimented on disguised faces in the wild (DFW) dataset which contains 1000 subjects with 1, 11,157 images -Performance of proposed system is compared with other existing methods -Results obtained from proposed system outperforms with other existing methods in terms of recognition accuracy The organization of the paper is as follows: Section 1 presents introduction to face recognition and challenges in forensic face recognition. Section 2 discusses literature review on partial occlusion in face recognition. A detailed discussion on proposed system is presented in section 3. Experimental results are discussed in section 4. Finally, section 5 potrays the concluding remarks.

LITERATURE SURVEY
In this section, a brief literature review on partial occlusion-based face identification and methods applied for partial occlusion-based face identification is discussed. A method based on multi cascaded convolutional neural network (CNN) is proposed [4] which performs the fusion of features at different levels producing feature vectors. Then classification is performed by applying cosine distance. Deep disguise recognizer network (DDRNET) model is proposed which uses the inception network to train the preprocessed images and similarity metric is calculated which can be used for the classification [5].
Singh et al. [4] proposed a deep convolutional neural network (DCNN) based approach for recognition. Here features were trained using two different networks and resulting features were used for recognition. Occluded face recognition consists of both global and local features. Mixing of local features will significantly improve the recognition accuracy in terms of partial occlusion [6]. Azeem et al. [6] proposed a lateral subspace strategy to acquire the local features and linear discriminant analysis is used to identify inter and intra class variations using weighted process. The experiment is carried out on augmented reality (AR), Japanese female facial expression (JAFFE), the face recognition technology (FERET) and Extended Cohn-Kanade (CK+) dataset which provides promising results [7]. Wen et al. proposed [8] a structure based occlusion coding method which includes structured dictionary and structure sparsity. This structure occlusion coding is used to remove the occlusion from nonoccluded images and thereby classification is performed for recognition. Min et al. [9] proposed an approach which is used to detect the occlusions in face and selective local Gabor patterns were applied on nonoccluded facial regions. Experimental results proves that the proposed method [10] out performs the state of the art compared to other existing methods. An approach based on weighted matrix is proposed in [11] which consist of two phases: occlusion detection and recognition. In the first phase, partial occlusions are detected using feature-based approach and support vector machine (SVM) with Euclidean nearest neighbor is used for face recognition.
Jozef et al. [12] proposed a large scale deep learning method which uses CNN to detect the faces. Experiment is carried out on annotated faces in the wild (AFW) , face detection dataset, and the benchmark (FDDB) dataset. Experimental results envisage that large scale deep learning method on AFW dataset shows a significant improvement in the overall recognition accuracy.
Devi [13] presents a comparative study of different methods like principal component analysis, K-Principal component analysis and SVM on partial occluded images. Jozef et al. [12] uses Viola jones algorithm for face detection in which these occluded images are sent as an email to the authorized person.
Wu et al. [14] proposed a novel object tracking algorithm called partial occlusion by background alignment (POBA). POBA tracker outperforms the six other trackers on OTB2015 benchmark dataset. An approximation algorithm based on greedy approach is presented in [15]. In the first step, string generation algorithm is designed to convert the face in to string of characters. Then approximation algorithm based on greedy approach is designed to perform string matching. Experiments are carried out on FEI, IAB, ORL and extended Yale-B dataset which shows significant improvement in the recognition accuracy.
Athreya et al. [16] proposed an approach based on the integration of compositional models with DCNN thereby generating a unified deep model which is robust to partial occlusion. A survey on partially occluded faces is presented in [15] which describe the different methods on partially occluded faces. From the comparative study it is observed that REGT provides promising results and performance is better compared with other existing methods. Most of the works [12]- [16] are carried out on partial occlusion based face recognition but none of the method has encountered a combination of HOG with Euclidean distance. To address this problem, an attempt is made in this direction to improve the recognition accuracy.

PROPOSED SYSTEM
In this section, a brief discussion on the implementation of proposed system for partial occlusionbased face recognition is presented. Proposed system is experimented on DFW dataset [16] which consists of 1000 subjects with 1,11,057 images. The dataset is divided in to testing and training set in which 600 subjects are used for testing and remaining 400 subjects are used for training.
Proposed system uses deep metric learning technique which generates a real-valued feature vector as an output. For the dlib facial recognition network, the output is 128-d feature vector which is used to quantify the face. The network is pre-trained using triplets, two images are faces of one subject and the third image is a random face from the dataset and is not the same subject. The metric quantified from the face is serialized and stored in the database.
If a new face having partial occlusion features needs to be recognized and the corresponding identity has to be verified, then the same HOG model is used to quantify the new face into a 128-d feature vector. This vector is then compared with the stored face encodings using a euclidean distance between the feature vectors [17].
A tolerance threshold is set before to perform the comparison process and a decision is made to check whether it is a match or mismatch. The subject with most matches is considered to be the recognized face in the image sample that is being tested. CNN based model or a 5-point HOG model using dlib is used to perform the quantification process. The HOG model is more successful, faster and more tolerant towards partial facial occlusion and hence it is employed in the proposed system. Proposed system consists of six phases: image acquisition, pre processing, face quantification, feature vector similarity detection, serialization and post processing which is depicted in the

Pre-processing stage
In the pre-processing stage, face detection from standard openCV methods using local binary pattern histogram (LBPH) are employed to isolate the faces and extract them for further steps. Since the faces do not have much sharp contrast features so there is no need to perform extensive sharpening filters or unsharp masking. All the images are converted in to RGB color scheme from a BGR color coding scheme because of compatibility and uniformity reasons. The detailed explanation of image acquisition step is provided in the experimental results section.

Face quantification Method
The HOG model is trained with the images having a dataset of 3 million images using a triplet training method where in two images of the subject and one false positive is used in training the model. This model has an accuracy of 99.38% on labelled faces in wild (LFW) dataset and so transfer learning method is applied to quantify the faces in DFW dataset. The HOG model is a ResNet network which consists of 29 convolutional layers where in few layers are removed and the number of filters per layer is reduced by half. The network is trained with randomly initialized weights and structured metric loss is used which tries to project all the identities into non-overlapping balls of radius 0.6. The loss is basically a type of pair-wise hinge loss that runs over all pairs in a mini-batch and includes hard-negative mining at the mini-batch level. The threshold for similarity index is derived empirically to maximize the verification accuracy keeping in mind the loss projection are non-overlapping balls of radius 0.6.

Feature vector similarity detection method
Even though there are multiple methods exist to generate the similarity of the 128-d feature vectors. In proposed system, euclidean distance is employed to generate 128-d feature vectors. The euclidean distance d (p, q) in two dimensions is generalized to 128-d as d (p, q, n=128) where p and q are the coordinates, n represents the number of feature vectors. This method is used to identify the proximity of the sample feature vector and the reference feature vector [14] which is shown in the (1).

Serialization
Pickle module in python is used to serialize and deserialize the feature vectors. This pickle module is the most fundamental and powerful algorithm for serialization and deserialiazation process. "Pickling" is the process whereby a Python object hierarchy is converted into a byte stream and "unpickling" is the inverse operation, whereby a byte stream is converted back into an object hierarchy. In proposed system, serialization process is performed using pickle module which converts Python object of input facial image in to byte stream.

Identification of partial occluded face recognition
It is the last step in proposed system. Here the byte stream generated from the given facial image is used for the recognition of partial occluded faces.

RESULTS AND DISCUSSION
In this section, a detailed discussion on experiments carried out on DFW is presented. The dataset is collected from IIIT Delhi [16] consisting of 1000 subjects and 1, 11, 057 images is indicated in the Table 1.
Experiments were carried out on two scenarios. In the first scenario, input image will be a trained partial occluded face and the probability of matching is good. Results of correct matches in partial occlusionbased recognition are shown in the Figure 3.  Second scenario: Input image will be not trained partial occluded face, so the proposed system will detect the nearest matching face and corresponding output will be displayed. In this scenario, even though the input facial image is detected but it is not showing the exact name of the individual. Results of the incorrect matches in partial occlusion-based recognition are shown in the Figure 4.
Proposed system provides a accuracy of 89.8% on DFW dataset and its performance is compared with the other existing algorithms which is illustrated in Table 2. The graphical representation of the comparative study is depicted in Figure 5. In this plot, a value in X-axis denotes the existing methods and proposed system. and a value in Y-axis denotes the recognition accuracy. From the comparitive study it is observed that proposed system gives a appreciable recognition accuracy of 89.8% and it outperforms when compared with other existing methods. Proposed system with a combination of HOG with euclidean distance is faster and more tolerant towards partial occlusion based face recognition.

CONCLUSION
The main aim of research work is to propose an efficient system for detecting partially occluded images. Partial occlusion face identification is one of the challenges in forensics. An attempt was made in this direction to address the problem of Partial occlusion face identification using a combination of HOG with Euclidean distance. The proposed system works effectively and achieved promising results of 89.8% compared with other existing methods. The proposed system helps in identifying the suspects who involved in criminal activities. Optimizing the threshold for the comparison of faces and more intelligent methods can be applied for calculation of proximity which provides an excellent opportunity for the researchers to carry out the research in the field of partial occlusion-based recognition.