Application of Texture Descriptors to Facial Emotion Recognition in Infants

Featured Application: A system to detect pain in infants using facial expressions has been developed. Our system can be easily adapted to a mobile app or a wearable device. The recognition rate is above 95% when using the Radon Barcodes (RBC) descriptor. It is the ﬁrst time that RBC is used in facial emotion recognition. Abstract: The recognition of facial emotions is an important issue in computer vision and artiﬁcial intelligence due to its important academic and commercial potential. If we focus on the health sector, the ability to detect and control patients’ emotions, mainly pain, is a fundamental objective within any medical service. Nowadays, the evaluation of pain in patients depends mainly on the continuous monitoring of the medical staff when the patient is unable to express verbally his/her experience of pain, as is the case of patients under sedation or babies. Therefore, it is necessary to provide alternative methods for its evaluation and detection. Facial expressions can be considered as a valid indicator of a person’s degree of pain. Consequently, this paper presents a monitoring system for babies that uses an automatic pain detection system by means of image analysis. This system could be accessed through wearable or mobile devices. To do this, this paper makes use of three different texture descriptors for pain detection: Local Binary Patterns, Local Ternary Patterns, and Radon Barcodes. These descriptors are used together with Support Vector Machines (SVM) for their classiﬁcation. The experimental results show that the proposed features give a very promising classiﬁcation accuracy of around 95% for the Infant COPE database, which proves the validity of the proposed method.


Introduction
Facial expressions are one of the most important stimuli when interpreting social interaction, as they provide information on the identity of the person and on his emotional state. Facial emotions are one of the most important signal systems when expressing to other people what happens to human beings [1].
The recognition of facial expressions is especially interesting because it allows for detecting feelings and moods in people, which are applicable in fields such as psychology, teaching, marketing or even health, which is the main objective of this work.
The automatic recognition of facial expressions could be a great advance in the field of health, in applications such as pain detection in people unable to communicate verbally, decreasing the continuous monitoring by medical staff, or for people with Autism Spectrum Disorder, for instance, who have difficulty when understanding other people emotions. diagnosis, or commercial devices with a smartphone application for parents [29] (smart socks [30] or the popular video monitors [31]).
However, no smartphone application or wearable device related to pain detection through facial expression recognition in infants has been found. Therefore, this work investigates different methods to implement a reliable tool to assist in the automatic detection of pain in infants using computer vision and supervised learning, extending our previous work presented in [2]. As mentioned before, texture descriptors and, specifically, Local Binary Patterns, are among the most popular algorithms to extract features for facial emotions recognition. Thus, this work will compare the results after applying several texture descriptors, including Radon Barcodes, which is the first time that they are used to detect facial emotions, this being the main contribution of this paper. Moreover, our tool can be easily implemented in a wireless and wearable system, so it could have many potential applications, such as alerting parents or medical staff quickly and efficiently when a baby is in pain.
This paper is organized as follows: Section 2 explains the main features about the methods used in our research and outlines the proposed method; Section 3 describes the experimental setup and the set of experiments completed and their interpretation; and, finally, conclusions and some future works are discussed in Section 4.

Materials and Methods
In this section, some theoretical concepts are explained first. Then, at the end of the section, the method followed to determine whether a baby is in pain or not is described.

Pain Perception in Babies
Traditionally, babies' pain has been undervalued, receiving limited attention due to the thought of babies suffering less pain than adults because of their supposed 'neurological immaturity' [32,33]. This has been refuted through several studies over the last few years, especially by the one conducted by the John Radcliffe Hospital in Oxford in 2015 [34], which concluded that infants' brains react in a very similar way to adult brains when they are exposed to the same pain stimulus. Recent works suggest that infants' units in hospitals must adopt reliable pain assessment tools, since they may derive in short-and long-term sequels [35,36].
As mentioned before, the impossibility of expressing pain in a verbal way has created the need of using other media to assess pain, detect it, and take the appropriate actions. This is why pain assessment scales based on behavioral indicators has been created, such as PIPP (Premature Infant Pain Profile) [37], CRIES (Crying; Requires increased oxygen administration; Increased vital signs; Expression; Sleeplessness) [38], NIPS (Neonatal Infant Pain Scale) [39], or NFCS (Neonatal Facial Coding System) [40,41]. While most assessment scales use vital signals such as heart rate or oxygen saturation, NFCS is based on facial changes through face muscles, mainly on forehead protrusion, contraction of eyelids, nasolabial groove, horizontal stretch of the mouth, and tense tongue [42]. Figure 1 shows a graphical example of the NFCS scale. As this paper uses an image database, this last scale is ideal to determine if the babies are or not in pain, by analyzing the facial changes in different areas according to the NFCS scale.

Feature Extraction
Feature extraction methods of facial expressions can be divided depending on their approach. Generally speaking, features are extracted from facial deformation, which is characterized by changes in shape and texture, and from facial motion, which is characterized by either the speed and direction of movement or deformations in the face image [43,44].
As explained in the last section, in this paper, the NFCS scale has been selected, since its reliability, validity, and clinical utility has been extensively proved [45,46]. The criteria of classification of pain in the NFCS scale is based on facial deformations and it depends on the texture of the face. Texture descriptors have been widely used in machine learning and pattern recognition, being successfully applied to object detection, face recognition, and facial expression analysis, among other applications [47]. Consequently, three texture descriptors are taken into account in this research: the popular Local Binary Pattern descriptor; then, a variation of this descriptor, the Local Ternary Patterns; and, finally, a recently proposed descriptor, the Radon Barcodes, which are based on the Radon transform.

Local Binary Patterns
Local Binary Patterns (LBP) are a simple but effective texture descriptor which label every pixel of the image analyzing its neighborhood. It identifies if the grey level of every neighbor pixel is above a certain threshold and codifies this comparison with a binary number. This descriptor has become very popular due to its good classification accuracy and its low computational cost, which allows real-time image processing in many applications. In addition, this descriptor has a great robustness when there are varying lighting conditions [48,49].
On its basic version, LBP operator works with a 3 × 3 matrix that goes across the image pixel by pixel, identifying the grey values of its eight neighbors and taking as a threshold the grey value of the central pixel. Thus, the binary code is obtained as follows: if the neighbor pixels has a lower value than the central one, they will coded as 0; otherwise, their code will be 1. Finally, each binary value is weighted by its corresponding power of two and added to obtain the LBP code of the pixel. In Figure 2, a graphic example is shown. This descriptor has been extended over the years, so that it can be used in circle neighborhoods of different sizes. In this circular version, neighbors are equally spaced, allowing the use of any radio and any number of neighboring pixels. Once the codes of all pixels are obtained, a histogram is created. It is also common to divide the image into cells, so that a histogram per cell would be obtained, being finally concatenated. In addition, the LBP descriptor has uniformity, which reduces negligible information significantly, and therefore it provides low computational cost and invariance to rotations, which become two important properties when applied to facial expression recognition in mobile and wearable devices [50].

Local Ternary Patterns
Tan and Triggs [51] presented a new texture operator which is more robust to noise than LBP in uniform regions. It consists of an LBP extended into 3-valued codes (0, 1, −1). Figure 3 shows a practical example of how Local Ternary Patterns (LTP) work: first, threshold t is established. Then, if any neighbor pixel has a value below the value of the central pixel minus the threshold, it is assigned −1 and, if the value is over the value of the central pixel plus the threshold, it is assigned 1. Otherwise, it is assigned 0. After the thresholding step, the upper pattern and lower pattern are constructed as follows: for the upper pattern, all 1's are assigned 1, and the rest of the values (0s and −1's) are assigned 0; for the lower pattern, all −1's are assigned 1, and the rest of the values (0s and 1's) are assigned 0. Finally, both patterns are encoded in two different binary codes, so this descriptor provides two binary codes for one pixel instead of one as LBP does, that is, more information about the texture of the image. All of this process is shown in Figure 3. The LTP operator has been applied successfully to similar applications as LBP, including medical images, human action classification and facial expression recognition, among others.

Radon Barcodes
The Radon Barcodes (RBC) operator is based on the Radon transform, which is having an increasing interest in image processing, since it is extremely robust to noise and presents scale and rotation invariance [52,53]. Moreover, it has been used for years to process medical images, and is the basis of current computerized tomography. As mentioned before, facial expression features are based on facial deformations and involve changes in shape, texture, and motion. As Radon transform presents valuable features regarding image translation, scaling, and rotation, its application to facial recognition of emotions has been considered in this work.
Essentially, Radon transform consists of an integral transform which projects all pixels from different orientations to a single vector. Consequently, RBCs are basically the sum (integral) of the values along lines constituted by different angles. Thus, Radon transform is first applied to any input image, and then projections are performed. Finally, all the projections are thresholded individually to generate code sections, which are concatenated to build the Radon Barcode. A simple way for thresholding the projection is to calculate a typical value using the median operator applied on all non-zero values of each projection [53]. Algorithm 1 shows how RBC works [53] and in Figure 4 a graphic example is shown.
Algorithm 1: Radon Barcode Generation [53] Initialize Radon Barcode r ← ∅ Initialize angle θ ← and R N = C N ← 32 Normalize the input image I = Normalize(I, R N , C N ) Set the number of projection angles, e.g., n p ← 8 Return r Until now, the main application of Radon Barcodes comes from medical image retrieval, where it has given high accuracy. As in the recognition of facial expressions robustness in orientation, illumination, and scale changes are needed, we consider that the RBC descriptor can be a good technique to provide a reliable classification of pain/non-pain in infants using facial images, being the first time that RBC are used in these kinds of applications.

Classification: Support Vector Machines
In order to classify properly the features extracted using any of the descriptors defined above, Support Vector Machines (SVM) are chosen.
The main idea of SVM is to select a hyperplane that is equidistant to the training examples of every class to be classified so that the so-called maximum margin hyperplane between classes is obtained [54,55]. To define this hyperplane, only the training data of each class that fall right next to those margins are taken into account, which are called support vectors. In this work, this hyperplane would be the one which separates the characteristics obtained from pain and non-pain facial images. In cases where a linear function does not allow for separating the examples properly, a nonlinear SVM is used. To define the hyperplane in this case, the input space of the examples X is transformed into a new one, Φ(X), where a linear separation hyperplane is constructed using kernel functions as they are represented in Figure 5. A kernel function K(x, x ) is a function that assigns to each pair of elements x, x ∈ X a real value corresponding to the scalar product of the transformed version of that element in a new space. There are several types of kernel, such as: • P-Grade polynomial kernel: • Gaussian kernel: where γ > 0 is a scaling parameter and τ is a constant. The selection of the kernel depends on the application and situation, and a linear kernel is recommended when the linear separation of data is simple. In the rest of the cases, it will be necessary to experiment with the different functions to obtain the best model for each case, since kernels use different algorithms and parameters.
Once the hyperplane is obtained, it will be transformed back into the original space, thus obtaining a nonlinear decision boundary [2].

The Proposed Method
Our application has been implemented in MATLAB c R2017. The toolboxes that have been used are Statistics and Machine Learning and Computer Vision System. As mentioned in Section 1, for the development of the tool, the Infant COPE database [3] has been used. This is a database that is composed of 195 color images of 26 neonates, 13 boys, and 13 girls, with an age between 18 hours and 3 days. For the images, the neonates have been exposed to the pain of the heel test and to three non-painful stimuli: a corporal disturbance (movement from one cradle to another), air stimulation applied to the nose, and the friction of a wet cotton in the heel. In addition, images of resting infants have been taken.
As mentioned before, this implementation could be applied to a mobile device and/or a wearable system, so that, on the one hand, a baby monitor would continuously analyze the images it captures. On the other hand, the parents or medical staff would wear a bracelet or have a mobile application to warn them when the baby is suffering pain. The diagram in Figure 6 shows a possible example of the implementation stages. The first step is pre-processing the input image by detecting infants' faces and then resizing the resulting images and converting them into grey scale. All images are normalized to a size of 100 × 120 pixels. Afterwards, features have been extracted using the texture descriptors mentioned before. The NFCS scale will be followed, so descriptors have been applied only to relevant facial areas to the NFCS scale: right eye, left eye, mouth, and brow. These areas are manually selected with sizes 30 × 50 pixels for eyes, 40 × 90 pixels for mouth, and 15 × 40 pixels for brow. It was possible to make an analysis to find the ideal sizes for each part due to the small size of the used database. Feature vectors from each area have been concatenated to obtain the global descriptor.
Finally, a previously trained SVM classifier decides if the input frame corresponds with a baby in pain or not. The system will be continuously monitoring the video frames obtained and sending an alarm to the mobile device if a pain expression is detected.

Results
In this section, a comparison of three different methods for feature extraction is completed: Local Binary Patterns, Local Ternary Patterns, and Radon Barcodes. According to the results obtained in [2], a Gaussian Kernel has been chosen for SVM classification, since it provides an optimal behavior for the Infant COPE database. SVM has been trained with 13 pain images and 13 non-pain images, and the tests have been performed with 30 pain images and 93 non-pain images different from the training stage. The unbalanced number of images is due to the number of pictures of each class available in the database.
To evaluate the tests, confusion matrices, cross-validation and error rate have been used. In this case, error rate has been calculated as the number of incorrect predictions divided by the total number of evaluated predictions.

Results on LBP
The parameters to be considered on the LBP descriptor are the radius, the number of neighbors and the cell sizes. As mentioned before, images has been previously cropped into four different areas. According to the previous results in [2], the best recognition rate is obtained when each of these areas is not divided into cells. Therefore, as it is shown in Figure 7, the recognition rates for all the possible combinations with radius 1, 2, and 3, and neighbors 8, 10, 12, 16, 18, 20, and 24 have been calculated to select the optimum values. As shown in Figure 7, the parameters with the best recognition rate are radius 2 and 18 neighbors. This combination presents the following confusion matrix CM LBP : It implies that there are three false positives and 10 false negatives, thus having an error rate of 10.57% and, therefore, a successful recognition rate of 89.43%.

Results on LTP
In this case, the parameters to be calculated on the LTP descriptor are the same as in LBP, but adding threshold t. Let us consider the same values for the parameters which gave the best result for LBP (radius 2 and 18 neighbors), and values from t = 1 to 10 for the threshold have been chosen.
As is shown in Figure 8, the best result is obtained for threshold t = 6, which presents the next confusion matrix CM LTP : It implies that there are 10 false positives and three false negatives, thus having an error rate of 10.57% and, therefore, a recognition rate of 89.43%.

Results on RBC
The parameter to be calculated in the RBC method is the number of projection angles. To do this, typical values 4, 8, 16, and 32, as considered in [53], have been chosen. The results of the carried tests are shown in Figure 9. As we can see in Figure 9, the best result is obtained with four projections, which presents the next confusion matrix CM RBC : It implies that there are three false positives and three false negatives, thus having an error rate of 4.88% and, therefore, a recognition rate of 95.12%.

Final Results and Discussion
As shown throughout this section, the best results are obtained by RBC with a recognition rate of 95.12%, followed by LBP and LTP with a recognition rate of 89.43 %. These results show the validity of applying Radon Barcodes to facial emotion recognition, as seen in Section 2, and it can be then concluded that the RBC descriptor is a reliable, robust texture descriptor against noise and scale and rotation invariance.
Taking into account the cross-validation values of each method, LBP has a value of 7.69%, LTP obtains 19.23%, and RBC a cross-validation score of 11.54%. With these results, it can be said that, in terms of being independent from the training images, LBP is better than LTP and RBC. Considering the runtime to identify the pain in an input image, LBP takes around 20 ms in processing a frame, LTP around 300 ms, and RBC around 30 ms. Therefore, in terms of cross-validation score and execution time results, LBP obtains better results. However, RBC behaves much better in terms of recognition rate. In Table 1, there is a summary of the obtained results. Considering that typically videos work at 25-30 frames per second, it can be said that both LBP and RBC would be able to analyze all frames detected in a second, allowing the system to be integrated in a mobile app or a wearable device. However, since facial expressions do not change drastically in less than a second, the recognition process would not lose accuracy by just analyzing a few frames per second, instead of 25-30. This would also reduce workload, getting a more efficient tool in terms of speed, as a result.
Finally, in Table 2, there is a comparison between our research and some previous works. All of these works have made use of the Infant COPE database and different feature extraction methods and classifiers such as texture descriptors, deep learning methods, or supervised learning methods. From the comparison of Table 2, it can be observed that the proposed method with Radon Barcode achieves the best recognition rate, over 10%, compared with previous works working with the same database. Therefore, it can be said that the proposed method can be used as a reliable tool to classify infant face expressions as pain or non-pain. Moreover, the time to process the algorithm makes it feasible to be implemented in a mobile app or a wearable device.
Finally, from the results in Table 2, it must be pointed out that different research that has used the same algorithms may provide different recognition rate results. This may be the result of the pre-processing stage in each work or due to the input parameters of the different feature extraction methods and/or the classifier used.

Conclusions
In this paper, a tool to identify infants' pain using machine learning has been implemented. The system achieves a great recognition rate when using Radon Barcodes, around 95.12%. This is the first time that RBC is used to recognize facial expressions, which proves the validity of the Radon Barcodes algorithm for the identification of emotions. In addition, as shown in Table 2, it has been proved that Radon Barcodes improved the recognition results compared to other recent proposed methods. Furthermore, the time to process frames for pain recognition with RBC makes it possible to use our system in a real mobile application.
In relation to this, we are currently working in implementing the tool in real time and designing a real wearable device to detect pain with facial images. We are beginning a collaboration with some hospitals to perform different tests and develop a prototype of the final system. Finally, we are also working with other infant databases and datasets with other ages to check the functionality and validity of the implemented tool, and the definition of a parameter to estimate the degree of pain is also under research.